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EDITOR’S INTRODUCTION 


THE past quarter of a century has been a most fruitful pe- 
niod in the development of Education, not only as a teaching 
subject, but as a means for the experimental study of educa- 
tional problems as well. During this period of time much 
new subject-matter for the instruction of students has been 
developed, and much new technique has been worked out 
and made applicable to the treatment of the results of in- 
vestigation. It is not too much to say that the subject- 
matter of Education has been entirely made over during 
this twenty-five-year period. The new teaching material 
and technique which have been evolved have been of many 
different types, but no aspect of this development, during 
the past decade and a half, has awakened more widespread 
interest, challenged the thinking of more young workers, or 
been more fruitful in results than the creation of tests and 
the application of statistical procedures to the interpreta- 
tion of the results obtained. 

The test movement has taken two main directions. One 
has been the creation of educational tests, by means of 
which we have been able to measure the results of the teach- 
ing process in many of its special phases; the other has been 
the evolution of mental tests of a number of types, by 
means of which we have sought to determine general intelli- 
gence and special aptitudes for training. The first aspect of 
the movement has been longer under way, and has by now 
resulted in an extensive series of educational tests for the 
measurement of instructional results in the different school 
subjects; the other has resulted in the evolution of individ- 
ual and group tests for general intelligence, various types of 
scales, and, more recently, the application of the idea to the 
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creation of personality and aptitude tests of a number of 
types. It is in this second phase of the test movement that 
the largest creative work is now being done. | 
For the educational tests we have for some time had al 
number of well-organized teaching texts covering the field, 
so that the student has been able to find, in the compass of al 
single volume, a good statement as to the nature, usefulness,, 
technique of giving and scoring, and norms for the a 
important educational tests. For the field of mental tests, 
on the contrary, we have until now had no such comprehen- 
sive single volume. It is this lack that the author of the 
present number in this series of textbooks has aimed to 
supply. What Monroe, DeVoss, and Kelly have done sa 
well for the first phase of the test movement in their Educa- 
tional Tests and Measurements, the author of the present: 
volume has now done for the field of mental tests. 
In the text now presented the author has shown how the 
mental test idea was evolved out of the laboratory study of 
individual differences by psychologists, how the individual 
and then the group intelligence tests were developed, the 
application of statistical methods to the interpretation of 
the results, the creation of the different types of scales, th 
extension of the mental test idea in new directions, th 
technique and theory of the tests, the uses of the different 
types of mental tests, and their reliability, and has closed 
his treatment with two chapters on the interpretation of 
what the tests really measure and the nature of intelligence 
itself. The work of hundreds of individual investigators ha 
been organized into a systematic treatise, and the place an 
work of each have been given their proper setting as parts 0 
a great movement. The volume is accordingly offered t 
teachers of college and university classes in Mental Test 
with confidence that it will prove as useful in this field as th 
texts now in use have done in the field of educational tests 
Exiwoop P. CusBERLEY | 
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portant types of mental tests. No book, so far as I am 
ck aecicarcdd whack han this scope. A number of 
books treat of special kinds or aspects of mental tests — 
chief of intelligence tests, but none includes a description 
of intelligence tests, of tests of special capacities, and of 
m-intellectual or personality tests. All these kinds of 
ests are important in their practical application, and they 
all involve much the same principles. Tt seems desirable, 
therefore, to treat them together. 

The general descriptive part of the book is organized 
historically The historical approach gives a convenient 
Bo telrvidacing the varions types of tests, and at the 
- me time gives the best basis for the appreciation of both 
e value and the limitations of tests. 

SE site hac buick Clernc is am conpheanis om yuincigles an 
contrasted with the mere surface facts concerning mental 
ests. The aim is to reveal the scientific problems which are 
involved in the design, application, and interpretation of 
tests, and not merely to prepare a manual for training 
Due to the recency of the development of mental tests, 
any of the principles which are involved in them are not 
very fully agreed upon, and some are not yet very clearly 
tecognized. In such a case, where the points under discus- 
sion are still matters of debate, and where some of them are 
Matters of individual opinion or interpretation, it seemed 
desirable to present various points of view, together with 
the evidence and the conclusions of the author. At the risk 


viii PREFACE 


of departing from the prevailing practice of textbook writ: 
ing, I have followed this procedure. My own conviction is 
that this method, which encourages the reader to weigh the 
evidence for himself, is preferable to a more dogmatic ex: 
position, both for the student and the reader in general. I 
is my hope that, on account of the wide popular interest ir 
mental tests and their interpretation, this discussion may he 
of interest to laymen who may wish to inform themselve: 
upon them, as well as to students of psychology and educa: 
tion. 

The point of view which I have adopted upon the most 
widely debated issues of interpretation is in agreement with 
neither extreme. I have endeavored to weigh the evidence 
as impartially as possible, and this evidence appears to me tc 
indicate that mental tests, particularly intelligence tests 
measure native capacity in part and education or training it 
part. 

Certain chapters are more technical than the remainder 
and may be omitted by the reader who is not interested it 
them without seriously breaking the continuity. Thes 
chapters are those on “The Technique and Theory of Men 
tal Tests,”’ on ““How to Tabulate the Results of Tests,” anc 
on “Mental Growth” — Chapters IX, X, XI, XII, an 
XIII. 

I wish to thank my colleagues, Professor A. W. Korn 
hauser and Professcr Karl Holzinger, for suggestions re 
garding certain points in the manuscript, the former con 
cerning the chapter on “‘’The Application of Mental Tests t 
Vocational Guidance and Selection,” and the latter espe 
cially on questions of technique and interpretation. I ar 
indebted to Dr. H. H. Goddard for reading the chapter o 
“Intelligence and Delinquency.” To the authors of th 
tests which are listed in the table at the end of Chapter V] 
I am greatly indebted for supplying me much of the i 
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formation which is there collected. I wish also to thank the 
authors and publishers who have given permission to copy 
illustrations. Acknowledgment is made by name in each 
case. 

Frank N. FREEMAN 
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MENTAL TESTS 


CHAPTER I 


INTRODUCTION: PRESENT STATUS OF MENTAL 
TESTS 


MeEnrTAz tests are of recent origin. They grew out of the 
study of individual differences in the psychological labora- 
tory. The study of individual differences, in turn, grew 
out of the experimentation which had for its aim the dis- 
covery of general principles or general laws concerning 
human behavior. At the beginning of this experimentation 
the individual variations which occurred in the course of 
experiments were regarded either as errors or as negligible 
quantities. After a time psychologists recognized that these 
differences were real and that they deserved to be studied 
directly. The study of these individual differences for them- 
selves began about thirty-five years ago. This was some 
fifteen years after the founding of the first important psy- 
chological laboratory by Wilhelm Wundt. 

The scientific interest in individual differences and their 
measurement began to develop about 1890. For ten or 
fifteen years tests were tried out in the psychological labora- 
tories of the universities. The educational interest in tests 
may be said to have begun about 1905. The development 
of practical tests for use in schools, therefore, began about 
twenty years ago. The interest in tests before this time was 
fostered largely by professors of psychology, and their ex- 
periments were carried on largely with the college students. 
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These experiments had very little immediate application to 
educational problems. They laid the foundation, however, 
for the development of tests which could be used for the 
practical differentiation of the pupils in the school. 

The beginning of the development of practical tests is 
represented in the work of Binet. Binet, like other psychol- | 
ogists, had been experimenting with tests during the decade 
1890-1900, but his labors during this early period had been | 
of little more practical value than those of other experi | 
menters. During the first decade of the present century, : 
however, he succeeded in developing the scale which over- 
came the shortcomings of the earlier tests and which proved 
to be of immense practical value. The outstanding charac- 
teristic of this decade was the development of the individual 
scale of the type of the well-known Binet-Simon scale. 
This test was applied particularly to the discovery of back- 
ward, subnormal, and feeble-minded children, in order that 
they might be assigned to special classes. 

While the dominant interest during this period was in 
groups of tests which are represented by the age scales of 
the type of the Binet Scale, there was also a considerable 
amount of experimentation going on with single tests by 
means of the method of correlation statistics. While the 
study of these single tests by means of the correlation 
method did not at the beginning prove very fruitful, it led, 
during the following decade, to types of experimentation 
and the development of types of tests which proved to have 
still wider application than the individual tests of the age- 
scale type. ‘These later tests are the prevalent group-point 
scales. ‘The past decade saw an enormous development of 
these group tests, which can be applied conveniently to 
children on a large scale. The most important factor in the 
large-scale development of these tests was, of course, the 
World War, which was the occasion of the production of the 
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army scales, and of scales patterned after them for use in the 
schoolroom. Coincident with the extension of tests came 
the shifting of interest from the backward or feeble-minded 
child to the normal child, and particularly to the child of 
unusual ability. 

Within the last five years an enormous number of group 
tests has been given. Over 1,700,000 men were given the 
Army Alpha Test. Following the War, a committee of 
psychologists who had been concerned with the develop- 
ment of the army tests formulated the National Intelligence 
Test. Within less than a year after this test was issued, 
over 575,000 copies were sold. During the year 1922-23, 
800,000 copies of this test were distributed. During the 
same year one firm which deals particularly in mental tests 
had sold over 2,500,000 intelligence tests. There are, at the 
present time, over thirty well-known group tests on the 
market which are designed for use in the schools. They are 
adapted to stages of development ranging from the kinder- 
garten to the university. They are used not only in the 
schools, but also in the industries and in the courts. . The 
terms in which mental abilities are described have become 
incorporated into popular language. The possibility of 
measuring an individual’s intelligence by a short and simple 
test has captured the imagination of school people and of the 
general public. 


1. A sample intelligence test 


In order that the reader may at the outset of the discus- 
sion make direct acquaintance with this one type of mental 
test — the intelligence test —he is here given an oppor- 
tunity to examine or to take an abbreviated form of such a 
test. The purpose of putting the test in at this point is to 
give the reader an idea of what we are talking about when we 
discuss mental tests. He should, of course, not draw con- 
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clusions as to what the test measures or as to the meaning 
of the score until he has read the later chapters. 

The following test was designed for group administration. 
It is graded to suit the capacity of high school seniors and 
college freshmen. The original from which the abbreviated 
form was made up is Test IV, Psychological Examination, 
by L. L. Thurstone.' 

The directions should be followed faithfully. 


DirEcTIONS 

This is a test to see how quickly and accurately you can think. 
The result of the test will be used by your advisers in order that 
they may know more about your abilities. 

On the inside pages there are 56 short problems. In each case 
you are told exactly what to do. Notice the instructions care- 
fully. You may use the margin for figuring. 

If you come to a problem that you do not understand, go to 
the next problem. 

Take ten minutes. Solve as many problems as you can in the 
time allowed. 

Solve the problems in order given. Do not skip about on the 
page. 

Tue TEst 

1. Underline the correct answer. 


London is in England Australia Brazil 
Spain : 


The correct word is England. Underline that word. 


2. Underline the correct answer. 
Boston is in Connecticut Rhode Island Maine 
Massachusetts 

3. Underline the correct answer. 


Diamonds are obtained from mines reefs 
elephants oysters 


s This test is published by C. H. Stoelting Company and is used by per- 
mission of the author. 


10. 


11. 
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. Underline two words that have the same relation as loco- 


motive and train. 
station horse hub baggage buggy 


Underline*horse and buggy because the horse pulls the 
buggy and the locomotive pulls the train. 


. Underline two words that have the same relation as good 


and bad. 2 
taste sweet conduct sour polite 
« 


Underline sweet and sour because they are opposite in 
meaning, just as good and bad are opposite in meaning. 


. Underline two words that have the same relation as ear and 


hear. 
eye hair blue see eyebrow 


. Underline two words that have the same relation as palace 


and king. 
hut peasant barn farm city 


. Make a perfect sentence. Write one word on a blank. 


MET CLALC]\se5.08.seccse cor acer days in a week. 


Write the word seven in the blank. 


. Make a perfect sentence. One word on a blank. 


hem boy. will c.-s...s--0- his shandeites..-o-:- plays with fire. 


If the following conclusion is true, underline true; if it is 
false, underline false. 


Brown is shorter than Smith. Jones is shorter. than 
Brown. Therefore Jones is shorter than Smith. 


True False (Underline one) 


Don’t put all your eggs in one basket. 


Check two of the following statements with nearly the 
same meaning as the above proverb: 


12. 


13. 


14. 


15. 


16. 


Lis 


18. 
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...The mouse that has but one hole is soon caught. 
...Catch the bear before you sell his skin. 
.. The proof of the pudding is the eating. 
..Put not all your crocks on one shelf. 


Check the first and fourth statements. 


Tall oaks from little acorns grow. 
Check two of the following statements with the same 


meaning as the above proverb: 
o 


..No grass grows on a beaten road. 
.. Large streams from little fountains flow. 
.. The exception proves the rule. 

...Great ends from little beginnings. 


Write the two numbers that should come next. 
2g 4 6 8 10 12 


The two numbers are 14 and 16. 


Write the two numbers that should come next. 
2 2 3 3 4 4 


Write the two numbers that should come next. 
1 v4 Q ff 3 7 


The two numbers are 4 and 7. 


Write the next two numbers. 
1 A 7 10 13 16 


Underline the correct answer. 


Arthur Brisbane is famous as a newspaper man 
comic artist athlete actor. 


Underline two words with the same relation as egg and 
bird. 


crack seed plant grow nest 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


‘ 
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Make a perfect sentence. One word on a blank. 


NES POOL scasmtererstsee Gs ner ys DECAUSE! acc Coeredacsenvansevoncat 
LSB aeaceenaaere dances MOUNINPELO seen seree erate acs 


John’s birthday is after Harry’s, and Harry’s birthday is 
after Tom’s. Therefore Tom’s birthday is before John’s. 


True False (Underline one) 


-_ «2 


‘*Every one of us, whatever our speculative opinions, knows 
better than he practices, and recognizes a better law than 
he obeys.”’ (Froude.) 


Check two of the following statements with the same 
meaning as the quotation above: 


.. To know right is to do the right. 

. Our speculative opinions determine our actions. 

..Our deeds fall short of the actions we approve. 
...Our ideas are in advance of our every day behavior. 


Write two numbers that should come next. 
14 16 18 20 99 24 


Underline the correct answer. 


Yale University is at Annapolis Ithaca Cam- 
bridge New Haven 


Underline two words with the same relation as foot and 
man. 


hoof leather shoe cow leg 


Underline the correct answer. 


“The makings of a nation” is an ad. of a tobacco 
flour beer health food 


Underline two words with the same relation as wash and 
face. 


sweep broom floor straw clean 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 
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Make a perfect sentence. One word on a blank. 


Tt tis Veryitrecese eee to! *becomie---..sc acquainted 
eeausdtadtees persons who...............timid. 


Since all metals are elements, the most rare of all the metals 
must be the most rare of all the elements. 


True False (Underline one) 


A small leak will sink a ship. 


Check two of the following statements with the same 
meaning as the above proverb: 


...Untempted virtue is easily retained. 
....A spark may start a great fire. 
.. When the cat is away the mice will play. 
...Reputation may be ruined by a word. 


Write the two numbers that should come next. 
ee & 5 8 12 17 


Underline the correct answer. 


Dioxygen is a disinfectant food product pat- 
ent medicine tooth paste 


Underline the two words with the same relation as skating 
and winter. 


swimming diving floating hole summer 


Underline the correct answer. 


The Corona is a kind of phonograph multigraph 
adding machine typewriter 


Underline two words with the same relation as able and 
unable. 


muscle exercise strong ax weak- 


35. 


36. 


37. 


38. 


39. 


40. 


41. 
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Make a perfect sentence. One word on a blank. 


his intelligence and faithfulness. 


All the members of the Civic Club are members of the 
University Club; Smith is not a member of the University 
Club; therefore he is not a member of the Civic Club. 


True False (Underline one) * 


“Equality is the life of conversation; and he is as much out 
who assumes to himself any part above another, as he who 
considers himself below the rest of society.”” (Steele.) 


Check two of the following statements with the same 
meaning as the above quotation. 


..One should assume himself below those with whom 
he converses. 

..One should not consider himself on a different level 
from those with whom he converses. 

..One must talk or be talked to, there is no middle 


ground. 
..Conversation should be democratic. 


Write the two numbers that should come next. 
28 31 33 36 38 41 


Underline the correct answer. 


The Delco System is used in plumbing filing 
ignition cataloguing 


Underline two words that have the same relation as tele- 
phone and hear. 


shout telegraph spyglass distance see 


Underline the correct answer. 


Darwin was most famous in literature science 
war politics 
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42. 


43. 


44. 


45. 


46. 


47. 


48, 


49, 
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Underline two words with the same relation as reward and 
hero. 
God everlasting punish pain traitor 


Make a perfect sentence. One word on a blank. 


All double-convex lenses magnify; plano-convex lenses are 
not double-convex; therefore plano-convex lenses do not 
magnify. 


True False (Underline one) 


Familiarity breeds contempt. 


Check two of the following statements with the same 
meaning as the above proverb. 


..Every bird likes its own nest best. 
....sweets grown common lose their dear delight. 
....Birds of a feather flock together. 

....-No man is a hero to his valet. 


Write the two numbers that should come next. 
42 41 37 36 82 31 


Underline the correct answer. 


The battle of Lexington was fought in 1620 1775 
1812 1864 


Underline two words with the same relation as floor-walker 
and store. 


policeman fire street conductor wagon 


Underline the correct answer. 


The Overland car is made in Toledo Flint 
Buffalo Detroit 


50. 


51. 


52. 


53. 


54. 


55. 


56. 


’ 
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Underline two words with the same relation as table and 
wood. 


stove bottle paper iron cork 


Make a perfect sentence. One word on a blank. 
PDOs eeseeocs 1S SRW YG SPINES, ...ctesavssse cose storm clouds 
SOMEUIM ESine a. .ccassovsescees Miser tent es ester us. 


No Athenians could have been Helots, for all Helots were 
slaves, and all Athenians were free men. 


True False (Underline one) 


“No great genius was ever without some mixture of mad- 
ness, nor can anything grand or superior to the voice of 
common mortals be spoken except by the agitated soul.” 
(Aristotle!) 
Check two of the following statements with the same 
meaning as the above quotation. 


. .Genius is essentially hard work and persistence. 
Contented and serene characters are the ones that 
produce works of genius. 

... Genius and insanity have certain elements in common. 

. Strokes of genius are likely to come after times of 
great disturbance or stress for the individual. 


Write the two numbers that should come next. 
15 18 24 33 45 60 


Underline the correct answer. 


Plymouth Rock is a kind of horse cattle gran- 
ite fowl 


Underline two words with the same relation as Japanese 
and Japan. 


Dutch Russia Holland Siberia Spanish 
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57. Underline the correct answer. 
Rio de Janeiro is a city of Spain Argentina 
Portugal Brazil 


58. Underline two words with the same relation as quarrel and 
enemy. 


policeman agreeable foe agree friend 


59. Make a perfect sentence. One word on a blank. 
Heer Pte ies things are.................satisfying to an 
OUGUMAL Veveteesateseercecetees than congenial friends. 


60. The recent panic occurred just after the President announced 
his policy regarding corporations in interstate commerce; 
therefore the President is to blame for the panic. 


True False (Underline) 


61. Rome was not built in a day. 


Check two of the statements with the same meaning as 
the above proverb. 


....To climb steep hills requires slow pace. 
.... When in Rome, do as the Romans do. 
....The result tests the work. 

....Napoleon himself was once a crying babe. 


62. Write the two numbers that should come next. 
19 Q1 23 18 20 99 


63. Underline the correct answer. 


The spark plug belongs in the carburetor mani- 
fold crank case cylinder head 


64. Underline two words with the same relation as eat and fat. 
food starve thin bread thirsty 


The reader may now check the number of items he has 
marked correctly by referring to the key in the appendix to 
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the chapter on page 28. The score is found by counting’ 
the number of correct items. 

The median score obtained by high school seniors or 
college freshmen when the test is taken under ordinary con- 
ditions is about 30. 


2. Reliability and meaning of mental tests 

We are now in the midst of a very lively discussion con- 
cerning the reliability and the meaning of mental tests such 
as this one. ‘This discussion is carried on chiefly by lay 
writers, or at least by observers who have not had specialized 
training in psychology or do not have an intimate knowledge 
of mental tests. Popular opinion goes to greater extremes 
concerning mental tests than does the opinion of psycholo- 
gists.! It is marked both by more implicit confidence in 
them and more extreme skepticism. Similarly, popular 
opinion shows the sharpest fluctuation from one period to 
another. Before the World War, the average intelligent 
layman probably had little confidence in the value or the 
use of mental tests. After the War, he believed that 
psychologists had devised a simple and relatively perfect 
method of measuring intelligence. Then a reaction against 
this extreme view set in, and it is possible that popular 
opinion is now swinging again toward the extreme of skep- 
ticism. 

That popular opinion should be subject to more extreme 
fluctuation than the opinion of psychologists is easily under- 
stood. The psychologist realizes that mental tests are the 
product of a long period of experimentation. He knows 
that our present-day tests have been developed from earlier 
forms of tests, that they constitute an improvement over 
the earlier forms, but that they are subject in some degree to 


1 For the agreement among psychologists see Frank N. Freeman, “A 
Referendum of Psychologists”; in The Century, December, 1923. 
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the limitations of the first attempts. He knows what the 
difficulties were which confronted the earlier experimenters, 
and the methods which were adopted to overcome these 
difficulties. He knows, furthermore, that these difficulties 
have been only partly met. 

The layman, on the other hand, has been accustomed to 
think of mental tests as something absolutely new. He 
regards them as an invention. He believes that psychol- 
ogists have made a unique analysis or classification of human 
abilities, and that they have been able to devise methods 
by which these abilities may be perfectly measured. This 
view seems to leave the way open for only one of two extreme 
conclusions. One may either regard mental tests as wholly 
successful, or he may entirely reject them. This view of the 
tests seems to furnish no opportunity for an intermediate 
view. There is no basis upon which one may form an opin- 
ion of the limitations of the tests and of the range of the 
problems to which they are adapted. 

A correct knowledge of the nature and development of 
mental tests shows the absurdity of the fundamental assump- 
tion which underlies the popular view. Mental tests are 
not absolutely new devices. They are not magical instru- 
ments for the discrimination and measurement of mental 
capacities. Their fundamental characteristics are the same 
as those of the ordinary examination with which we have 
been familiar so long. 

The methods of the examination have, of course, been 
very greatly refined. Students of mental tests have dis- 
covered what will work and what will not work. They have 
devised methods of organizing tests so that they will measure 
the abilities which are the most significant factors in general 
human behavior. They have discovered methods of making 
the tests so widely applicable that broad comparisons can 
be made by means of them. Mental tests enable us to secure 
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with comparative ease a more widely comparable measure 
than could be secured by any other means. Their accuracy, 
furthermore, is at least comparable to that of the best meth- 
ods which exist beside them. In addition to this, they enable 
us to make a type of analysis of mental abilities which we 
cannot make as satisfactorily by any other means. All. 
these advantages of mental tests must be granted. 

At the same time, it must be recognized that enthusiastic 
advocates of mental tests sometimes give excuse for the un- 
due enthusiasms of the layman. While we recognize the 
advantages which they offer, we must not exaggerate the 
accuracy of the measures which they yield. We must 
recognize that the ratings of human capacity which they 
enable us to make are correct only within certain limits of 
error. We must recognize further that the nature of the 
capacities which they measure are known to us only in a 
rough way. The interrelationships between the abilities 
which are measured by various tests are often very sur- 
prising. They indicate that abilities which we would not 
expect to be closely related do, as a matter of fact, correspond 
very closely, and abilities which we are accustomed to be- 
lieve closely associated are really comparatively independent 
of one another. 

The advancement of the technique of mental tests will be 
furthered most by a sane recognition of both the advantages 
and the limitations of the present tests. We may be grate- 
ful for that which they furnish us, without exaggerating what 
they have to offer. An unwarranted satisfaction with pre- 
sent tests will prove a hindrance to the experimentation which 
is necessary for the development of tests of greater accuracy, 
greater range, and of more analytical power than our present 
tests possess. 

Our present tests are most successful as measures of the 
composite of mental abilities which is sometimes called 


16 MENTAL TESTS 


intelligence. They are weakest in their failure adequately 
to analyze intellectual ability into the various forms or the 
various elements of which it is composed. The measure- 
ment of this general capacity is of great value. The ability 
to analyze mental abilities into types, however, would be of 
great additional value. It is probably in this direction that 
the future development of tests will move. We shall, 
undoubtedly, be able in time to isolate the various abilities 
more adequately than we can with our present means. In 
addition to the various aspects of intellectual ability itself 
will come the measurement of the various non-intellectual 
traits, such as will, emotion, and moral attitude. The de- 
velopment of the technique for measuring these various 
mental capacities will ultimately lead to a comprehensive 
test or series of mental tests which will enable us to make an 
all-round examination of the individual. 


8. Definition and classification of tests 

A test may be fundamentally distinguished from a de- 
scriptive account of a mental function. Both the descrip- 
tive account and the mental test involve the use of accurate 
experimental technique, but the aim of the two forms of 
procedure is different. The aim of the descriptive account 
is to determine how a mental function operates, how it 
develops, what its causes and effects are. We may be 
interested, for example, in breaking up a perception into 
its constituent sensations. We find that what looks at first 
glance like a simple experience is really a complex one. We 
may be interested in learning the causes of peculiar experi- 
ences, such as optical illusions. In doing this, we again 
break up or analyze the experience into its parts. We may 
be interested in studying the growth or development of a 
mental ability. Such studies are concerned with the learning 
process. They trace the effect of practice upon skill or 
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knowledge. A somewhat similar study is the investigation 
of the development of the mental capacity in the child as he 
grows from babyhood to maturity, or the somewhat parallel 
development of the mental life of animals from the lowest 
organisms to the higher mammals. 

In contrast to these types of study and to these aims, the 
mental test seeks to measure the strength, precision, or effec- 
tiveness of the present operation of any mental activity. It 
does not aim to determine how the activity was developed or 
in what it originated, or, necessarily, the elements of which it ; 
is composed. It takes the ability as it exists at the present 
time, and attempts to set up means of estimating its degree. 
In the case of sensation, for example, a test aims not to ana- 
lyze the sensation, but to determine the ability of the indi- 
vidual to discriminate between sensory stimuli which differ 
by a slight degree. In the case of learning, the test seeks 
not to discover how one learns, but to discover the rapidity 
or accuracy with which one person can learn in comparison | 
with other persons. 

Mental tests again may be distinguished from educa- 
tional tests. The aim of both alike is to measure the present 
efficiency of the individual in certain specific respects. 
They differ, however, in this; whereas the educational tests 
seek to measure the products of training, and indirectly to 
determine the efficiency of the training which the individual 
has received, the mental tests aim to measure the original 
capacity which the individual had for the acquirement of 
skill or knowledge or ability. The extent to which educa- 
tional and mental tests are able to meet the demands of these 


contrasted aims is a matter to which we shall have to give ._ 


attention later in the discussion. 

We may note in passing that mental tests and educational 
tests have often been studied in relation to one another, in 
order that it might be determined what the ratio is be- 
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tween the inherent capacity of an individual or group of 
individuals and the actual achievement which they have 
made. 

Tests, whether they be mental tests or educational tests, 
are both relative. That is, the score which results from the 
application of the test has significance only by comparison 
with scores which are made by other individuals. The 
score serves aS a comparatively exact numerical method of 
indicating the rank of the individual in a group in which he 
may be placed or with which he may be compared. The 
absolute score which the individual makes, taken by itself, 
has, therefore, no significance. The method by which is 
expressed the relationship between the score of an individual 
and the scores of the group constitutes one of the important 
phases of the technique of mental testing. 

Mental tests are of various kinds, according as they aim 
to measure general or special capacities. 'The tests which are 
most widely used by educators at the present time are gen- 
eral tests, usually called general intelligence tests, or some- 
times mental alertness tests. ‘There are, however, a consider- 
able number of tests in use which aim to measure not all- 
round or general intellectual capacity, but which, on the 
other hand, aim to measure some special capacity or set of 
capacities. An example of such tests is the collection of tests 
of musical ability designed by Seashore. A good many tests 
of special capacity have been used in vocational guidance. 
The purpose of this book will be to include a treatment of 
the special tests as well as the general ones. 

We sometimes speak of tests as though they measured 
intellectual capacity directly. This, of course, is not true. 
What they measure is the manifestation of capacity in action 
or in behavior. Intellectual capacity is not something which 
can be seen, felt, heard, or measured in any direct fashion. 
We assume in mental tests that the behavior of the individ- 
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ual expresses or represents the maximum of which he is 
capable. 

Behavior, however, is always conditioned, not only by 
capacity, but also by previous experience or training. A 
person is able to play a piano or to write on a typewriter 
not only because he has the capacity for learning to use these 
instruments, but also because he has gone through a course 
of training. These are specific but rather extreme cases. 
To take a more widely applicable and general case, a person 
is able to use language because he has come in contact with 
language and has acquired the ability to pronounce words 
and a knowledge of the meaning of words. Somewhat more 
generally still, a person has learned to distinguish a color 
because he has met with different colors. He has learned 
to distinguish the pitch of tones because he has met with 
tones of different pitches. 

In all of these cases, or in any case which might be 
mentioned, the capacity which the individual has to start 
with is combined with the results of his training to make up 
the ability which he possesses at the present moment or at 
the moment of being tested. If training is thus always 
present as a factor in determining present ability, how is it 
possible to distinguish native capacity from the results of 
training? To put the question in a somewhat more specific 
way, how can we determine that the differences between 
individuals are due to differences in their capacity, rather 
than to differences in the training which they have received? 

Mental tests are so designed as to meet this difficulty, so 
far as possible, in the following manner: The particular 
activities which are demanded of the individual being tested 
are selected from among those which are common to the 
experience of all the persons who are to be compared. Or, 
to put it in another way, it is assumed that training or ex- 
perience in the activities which are being tested are equal, 
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or as nearly as possible equal, among all the individuals. 
For this reason, typewriting or piano-playing, for example, 
would not be taken as the subject of a mental test for motor 
dexterity or manual capacity. These particular complex 
types of behavior require special practice for their mastery. 
If one wished to test capacity in learning of this type, it 
would be necessary to devise some activity of a similar 
nature to these, but one which had never been practiced by 
any of the individuals who were to be tested. The meas- 
urement of the ability of the individual to distinguish the 
pitch of two tones, or the shade of two colors, is, in contrast 
to typewriting or piano-playing, a suitable test for native 
capacity, because all the individuals who are likely to be 
tested have had sufficient practice in doing these things to 
bring their capacities up to something near their maximum. 
This is partly due to the fact that everybody has had 
practice in these things, and partly to the fact that they are 
not so susceptible to practice as the more complex activities 
of piano-playing or typewriting. 

It may, of course, be questioned whether it is ever possible 
to find activities in which the individuals who are to be 
compared have had equal opportunity for training, or in 
which training is a negligible factor. We are faced here 
with a dilemma. If we choose to measure abilities in which 
training has little effect, we find that our measurements 
have very little general significance. If, on the other hand, 
we select abilities which are complex in their nature, and 
which are therefore of general significance, we find it diffi- 
cult to secure activities in which previous experience is not 
an important factor. This is one of the problems concerning 
which a knowledge of the development of tests and their 
technique will enable us to be duly critical and on our guard. 

We may summarize this descriptive account of mental 
tests in the following definition: “‘ Mental tests are instru« 


’ 
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ments for the measurement of relative mental capacity, either 
general or special.” We may comment briefly on some of 
the terms of this definition. 

Mental tests are instruments of measurement and not 
means of making guesses or estimates. They are therefore 
to be distinguished from methods of rating individual 
abilities by means of rating scales. They issue in numerical 
scores which can be manipulated by mathematical processes 
and combined or compared with other numerical scores. 

The method by which these scores are obtained may not, 
of course, be valid, but it is of advantage that the results 
of the tests be thus expressed in quantities which are subject 
to mathematical formulation. 

The significance of the measurements which are made by 
means of mental tests grows out of the fact that the tests 
are standardized. This standardization concerns the ma- 
terials which go into the tests, the method of procedure 
in giving the tests and in scoring the results, and the norms 
with which the scores of individuals are compared. Stand- 
ardization, in brief, means that all of these matters are 
worked out by actual trial. The materials and methods 
of procedure are not invented by some psychologist in the 
seclusion of his study, but they are arrived at after the tests 
have been given to large numbers of children and after the 
procedure has been discovered which proves to be successful. 

The measurement is relative, as already noted, because 
the score which any individual makes is to be interpreted in 
comparison with the scores which are made by other in- 
dividuals. This comparison, in the case of tests which aim 
to measure native capacities rather than the results of train- 
ing, is completely satisfactory only in groups which have 
had approximately equal opportunity, or with reference to 
mental functions in which differences of training or oppor- 
tunity have a very slight effect. It is not possible to deter- 
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mine with exactness just how far a test score may be due to 
training or to native capacity. This constitutes one of the 
large problems, both in the organization and the interpre- 
tation of tests, which we shall have to discuss at greater 
length in a later chapter. 


4. The uses of mental tests 


We may anticipate the more detailed discussion of the 
applications of mental tests, in the later chapters, by giving 
a brief summary or survey of their uses, as a means of in- 
troducing us to the fuller description of the characteristics 
and organization of mental tests themselves. Some of these 
uses are practical and some theoretical. We are more im- 
mediately concerned with the practical uses than with the 
theoretical uses. It may ultimately turn out, however, 
that the theoretical interpretations of the results of mental 
tests may have a more far-reaching practical effect than their 
immediate practical application. 

The first use to which tests are put is the classification of 
pupils in school according to ability. It has long been 
recognized that pupils differ very widely in their capacity 
to do school work. ‘This recognition first became clear with 
reference to feeble-minded or backward children. The fact 
of backwardness was forced upon the attention of school 
authorities by the large amount of retardation and elimina- 
tion which it produced. After tests had been made of all 
children, and the distribution of pupils’ abilities had been 
tabulated, however, it was discovered that as many pupils 
possessed extremely high as extremely low ability. In fact, 
the evidence has gone to show that the pupils are distributed 
with reference to their intellectual ability in the same 
fashion above the median as below it. The problem of 
classification, therefore, is a much broader one than is re- 
presented in the selection of a few pupils for special instruc- 
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tion because of their mental deficiency. It is only within 
the last decade or so, however, that this has been clearly 
recognized as a problem. 

The classification of pupils according to abilities may be 
either vertical or horizontal. By vertical classification is 
meant the arrangement of pupils at successive levels of 
attainment or advancement through the school. According 
to this method the pupils who are able to do a certain grade 
of work are placed in a particular school grade. Those who 


are able to do a higher grade of work are placed in a higher 


grade. Promotion, according to this type of classification, 
is based on mental capacity alone without regard to any 
other factor. This may be called vertical classification. 

Horizontal classification, on the other hand, takes all 
the pupils at a given stage in school advancement, who may 
be widely diverse in their ability to do work at that stage, 
and groups them according to their ability. They are 
grouped into horizontal divisions according to ability, and 
their grouping in vertical divisions is based simply upon 
their age. If this type of classification alone is carried out, 
it does not affect promotion through the school, but does 
affect the difficulty of work or the quality of work which a 
pupil does in each successive grade. Whether vertical 
grouping or horizontal grouping is the better, or whether 
some combination of the two methods is better than either 
alone, is a problem for later more detailed consideration. 

A second general use of tests in the school is to serve as a 
means of diagnosis of the capacity of pupils who present 
problems in adjustment because of their failure to do suc- 
cessfully part or all of the work of the school. The needs of 
individual diagnosis and the subsequent treatment are not 
entirely met by the general classification which has already 
been spoken of. The failure of the pupil may not be due to 
inherent incapacity. His classification in a low group, 
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therefore, does not solve the problem. The problem which 
is presented may be to find a means of awakening the pupil 
to the realization of his capacity. The poor work may be 
due to a variety of causes. Mental tests contribute to the 
solution of the problem by indicating the extent to which 
the poor work is caused by incapacity. While the results of 
the test are not absolutely conclusive in regard to the pupil’s 
capacity, they do, at least, indicate a line of experimental 
treatment which may result in the solution of the difficulty. 

A third use of tests consists of educational guidance. 
Some pupils may, by reason of the degree of general intel- 
lectual ability they possess, or because of the type of their 
capacity, be better adapted to some courses of study than to 
others. Again, the length of time a pupil can profitably 
remain in school may be determined by his native capacity. 
Mental tests serve, therefore, as partial means of estimating 
the kind or extent of work for which the pupil’s capacity 
suits him. Educational guidance takes the form of advice 
in the selection of courses or of the larger groups of courses 
or curricula, or of advice concerning the desirability of 
remaining in school or college or of going to work. 

Educational guidance naturally leads into and prepares 
the way for vocational guidance. The selection of the types 
of work which the pupil takes in school looks forward to the 
type of vocation which he shall pursue. Vocational guid- 
ance takes the matter up at the point where educational 
guidance leaves it, and attempts in a more specific way to 
aid the pupil in the choice of a vocation. 

Tests for vocational guidance may be tests of general 
ability, or tests of special ability. The use of tests of general 
capacity is based upon the assumption that various voca- 
tions require for their successful pursuit different degrees of 
intellectual capacity. If this assumption is correct, it is 
possible within certain limits, by means of tests, to deter- 
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mine those groups of vocations which a person can expect to 
pursue successfully. It would not, of course, determine 
which among a group demanding equal ability he should 
choose. Tests of specialized ability aim more specifically 
to determine whether one can meet the requirements of a 
particular vocation, other than the requirement of general 
capacity. These tests are, for the most part, applicable to 
specialized jobs in industry or to specialized phases of the 
work of a more general vocation. They may consist of 
single tests of simple mental functions or of groups of tests 
which measure a variety of functions, all of which are re- 
quired in the vocation, or of tests which measure a complex 
activity involving a variety of separate capacities. Examples 
of all of these types of tests are in existence and have been 
tried in vocational guidance. 

The application of mental tests for vocational selection, as 
distinguished from guidance, consists in their use in selec- 
tion, transfer, or promotion of employees. Tests are here 
used not to determine what vocation an individual should 
enter, but to determine whether or not individuals who may 
wish to enter a particular vocation meet the conditions of 
that one particular vocation. This concerns the selection of 
employees. Transfer involves somewhat similar principles. 
Employees who are in one department of an organization, 
and who may not be suited to the work of this department, 
may be more capable of performing the work of some other 
department. Tests may be given to them to determine 
this fact. Promotion may be governed in part by capacity 
as measured by tests. 

The use of tests for vocational guidance or for the selec- 
tion of employees depends, of course, on whether or not 
vocations do differ largely in their demands, and whether 
individuals differ in the possession of capacities to meet 
these demands. Upon this question there may be a diverg- 
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ence of opinion. The complete solution of the problem rests 
largely with future experimentation. 

Again, mental tests have been applied to delinquents. 
Their purpose is to assist in fixing the degree of responsibility 
and in indicating the kind of treatment which should be 
given. Tests have been widely used in dealing with juvenile 
delinquents and somewhat less extensively with adult of- 
fenders. Widely different views are held with reference to 
the significance and the interpretation of the results. On 
the one hand, the opinion is rather general that mental in- 
capacity is responsible for a very large share of crime. On 
the other hand, it is believed that while mental deficiency 
is responsible for certain cases of crime, and particularly for 
certain types of crime, it is, on the whole, a relatively minor 
cause. The result of these tests will be reviewed more at 
length in a special chapter. 

The final practical use of mental tests which will be 
mentioned here is the measurement of the efficiency of edu- 
cational units. By efficiency is meant the relationship be- 
tween achievement and capacity. In speaking of the 
efficiency of an individual, we ask ourselves not merely what 
capacity he has, but whether he realizes his capacity in 
productive activity. If we assume that mental tests meas- 
ure native capacity, and educational tests measure the result 
of training, it follows that the relationship between the scores 
on educational tests and scores on mental tests will represent 
the efficiency of the individual or of the group. This re- 
lationship has been expressed in the form of a ratio which 
is called the achievement quotient, or the accomplishment ratio. 
The validity of the achievement quotient, of course, depends 
upon the clearness of the distinction between the measure- 
ment of native capacity and of training, and upon the 
accuracy of the measures of both of these factors. 

A somewhat more theoretical problem is the determina- 
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tion of the character of the mental growth of children. The 
multitude of scores of children of various ages which have 
been gathered from the application of various mental tests 
yields a mass of material which gives a basis for more valid 
estimates of the character of intellectual growth than we 
have previously possessed. It is true that these measures 
have usually been limited in one respect. They have been 
made upon different children of different ages. They have 
not, that is, given successive measures of the capacity of the 
same child at successive periods in its growth. They there- 
fore give us only a mass picture of the general characteristics 
of growth, and do not enable us to determine what the 
fluctuations in the case of individuals are. Beginnings are 
being made in the successive testing of individual children, 
so that we may, in time, possess a more accurate picture of 
intellectual development. 

One of the long-standing problems with reference to hu- 
man capacity concerns the relative effect of the factors of 
heredity and environment. The question is, do differences 
between individuals depend largely on differences in their 
inherited mental traits, or are they the product of differences 
in training and the less definable features of the general 
mental and physical environment. As we have already 
seen, the relative effect of heredity and environment is in 
reality a problem in the interpretation of the test scores 
themselves. It might seem, therefore, as though test scores 
could not be used as means of determining the relative 
share which native capacity has in any other form of achieve- 
ment. The problem presents difficulties, and we are only 
at the beginning of its solution, but there are methods, 
as we shall see in a later chapter, by which we may at least 
make some advance toward its solution. 

Finally, mental tests furnish means of studying the 
interrelationship of mental traits and of investigating 
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mental types. It is, of course, a moot question whether 
mental types exist. By types is meant constellations or 
groups of abilities which are frequently found to exist in 
conjunction with one another. One view is that there do 
not exist such types, but that the various mental abilities 
are just as likely to be associated in one way as in another. 
The study of the correlation of the scores of mental tests will 
ultimately enable us to determine whether such types exist. 
The difficulty with the interpretation at the present time is 
that the tests themselves do not measure very clearly de- 
finable characteristics. There is a large overlapping in the 
functions which are measured by the various tests. This is 
one of the problems for the future which rests, in part, upon 
the development of a somewhat new type of test itself. 

We have now reviewed briefly the topics or questions 
which will be discussed in a more specialized fashion through- 
out the different chapters of the book. We shall first review 
at somewhat greater length the historical development of 
tests, in order that we may bring out the contrast between 
the earlier less successful attempts and the later more suc- 
cessful ones. We shall then pass in review the different 
kinds of tests, examining in some detail the most prominent 
tests which are of practical importance in the school. 
Finally, we shall consider the uses of tests, particularly those 
which have important applications in education. Through- 
out the discussion we shall emphasize particularly those 
aspects of tests which are of practical importance. 


Correct ANSWERS TO THE ITEMS or THE TEST ON PaGE 4 


2. Massachusetts 12. numbers 2 and 4 

3. mines 16. 19, 22 

6. eye, see 17. newspaper man 

7. hut, peasant 18. seed, plant 

9. burn, he 19. man, he, had, eat (or 
10. true equivalent) 
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20. true 

21. third and fourth 
22. 26, 28 

23. New Haven 

24. hoof, cow 

25. tobacco 

26. sweep, floor 

27. difficult, well, with, are 
28. false 

29. second and fourth 
SU 23. 30 

31. disinfectant 

32. swimming, summer 
33. typewriter 

34. strong, weak 

35. is, animal, of 

36. true 

37. second and fourth 
38. 43, 46 

39. ignition 

40. spyglass, see 

41. science 

42. punish, traitor 


. not, where, can 

. false 

. second and fourth 
§ RG AG 

eel iaiken 

. policeman, street 
. Toledo 

. stove, iron 

. sun, but, hide, from 
. true 

. third and fourth 

. 78, 99 

. fowl 

. Dutch, Holland 

. Brazil 

. agree, friend 

. few, more, person 
. false 

. first and fourth 
Ble Re, 

. cylinder head 

. starve, thin 
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CHAPTER II 
EARLY EXPERIMENTATION WITH TESTS 


1. Early studies of individual differences 

Tue first clear case on record of the scientific recognition of 
individual differences in mental abilities occurred in con- 
nection with the work of the Greenwich Astronomical 
Observatory, in England. In 1795, one of the observers was 
found to differ from his colleagues in his estimate of the 
time of transit of a star. The method which was pursued 
was to watch the star through a telescope and to note the 
time at which the image of the star crossed a line in the 
field of view. This required that the observer should watch 
the star until it approached the line and then look at a clock 
and estimate the exact time at which the star crossed the 
line. Because this particular observer differed from his 
colleagues, he was discharged from the staff. 

It was later discovered, however, that it is incorrect to 
assume that some observers are right and others wrong. It 
was found that there is an error of observation in the case 
of all observers, and that the amount of this error differs 
with different individuals. This difference in the amount 
of error was called the personal equation. This term, which 
first referred to individual differences in the reaction time of 
observation, came later to be applied to differences in all 
sorts of mental attitudes and was adopted into general use. 

In 1822 astronomers came to recognize the difference in 
the reaction time of observers and to make allowance for it, 
but the systematic study of individual differences was not 
made until more than fifty years later. When psychological 
laboratories were first founded, the interest, as was remarked 
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in the first chapter, was mainly in general laws or general 
principles of human behavior. An illustration of one of the 
earlier generalizations is the Weber-Fechner Law. This law 
concerns the relationship between the intensity of a stimulus 
and the amount of increase in the stimulus which is necessary 
in order that the person shall detect a difference. In gen- 
eral, the principle is that if a stimulus is very intense, it is 
necessary to add a large increment in order that a difference 
shall be perceived, whereas if the stimulus is very faint, a 
small difference will be perceived. This principle holds for 
all the different senses, but the proportionate amount which 
must be added so as to be perceived differs somewhat among 
the various senses. For example, if one is looking at two 
lights which differ from one another by a slight amount, it is 
necessary that the one shall be one per cent more intense 
than the other in order that the difference may be detected; 
if the lights are seen in succession, on the other hand, one 
must be ten per cent more intense than the other. In the 
case of weights, one must ordinarily be about five per cent 
heavier than the other, in order that the difference may be 
distinguished. 

It was with such general laws as this, which deal with 
facts of behavior common to all persons, that the earlier 
scientific studies in psychology were concerned. It became 
apparent before long, however, that the differences in the 
behavior of different persons were of such importance that 
they could not be neglected. It had been common to desig- 
nate the differences among the individuals of a group of 
observers by the term probable error. This term repre- 
sents approximately the average amount by which the re- 
actions of the various individuals differ from the typical 
reaction of the group. It became evident, in the course of 
experimentation, that these divergencies of individuals 
from the other individuals of the group are not due mainly 
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to error, but constitute real differences in their mental 
capacity or modes of behavior. 

It is interesting that one of the earliest forms of behavior, 
in which these individual differences were clearly recog- 
nized, was the same which occasioned the recognition of the 
personal equation in the astronomical observatory. This 
form of behavior is reaction. Cattell, working in the labora- 
tory of Wundt, discovered there were characteristic differ- 
ences in the reaction time of different persons. This un- 
doubtedly called his attention to the necessity of studying 
individual differences, and stimulated his later experimen- 
tation with mental tests, which proved the starting-point of 
the development of these tests in the United States. 

Another source of interest in individual differences is to 
be found in the studies of heredity among the English school 
of scientists. A group of men, including Charles Darwin, 
Wallace, Huxley, and Spencer, were students of evolution, 
and of the inheritance of physical characteristics as one 
phase of evolution. Francis Galton, who was a cousin of 
Charles Darwin, became interested in the extension of this 
study to the inheritance of mental characteristics. He 
believed that temperamental traits and differences in in- 
tellectual capacity are inherited in the same way as are 
physical traits. Galton made a number of scientific in- 
vestigations to produce evidence on mental inheritance. 

In order that the inheritance of differences might be 
studied, it became necessary to discover means of measuring 
these differences. One of the methods which Galton used 
was the questionnaire. He investigated differences in the 
vividness or accuracy of mental imagery by asking a large 
number of persons to report what they could remember of 
the appearance of their breakfast table. In addition to this 
method, he developed certain instruments for the study of 
differences of sensation. One of these was the so-called 
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Galton whistle, which is designed to measure the highest 
tone which it is possible for a person to hear. 

Cattell was for a time associated with Galton, and this 
association strengthened the interest. in individual differ-, 
ences which had been developed from his earlier study. |- 
Upon taking up his work as a teacher of psychology in the | 
United States, Cattell proposed a program of tests. This | 
program was published in the British journal, Mind, in an | 
article which was published! in 1890. Galton contributed 
a number of comments at the end of the article, and this 
gives us direct evidence of the connection between the study 
of inheritance in England and the mental testing movement 
in the United States. 

The purpose of the program of tests which was set forth 
in Cattell’s article was as follows: first, to determine the 
constancy of mental processes, or the degree in which they 
vary from time to time in the same individual; second, to 
determine the degree of interdependence between the various 
mental processes; and, third, to determine the amount of 
their variation under different circumstances. While the 
tests constitute means of measuring the differences between 
the behavior of different persons, it will be seen that a con- 
siderable share of the interest in them was still concerned 
with the analysis of the mental life of the individual con- 
sidered alone. 

The character of these early tests may be gathered from 
the list of ten which were recommended for most extended 
use. They were as follows: 


1. Measurement of the strength of grip by the dynamometer. 
This is an instrument containing a strong spring which is 
compressed by the grasping movement of the hand. The 
amount of the pressure which is exerted is recorded upon 
a dial. 


1 J. McK. Cattell, “Mental Tests and Measurements,”’ in Mind, 1890, 
vol. 15, pp. 373-80. 
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2, Measurement of the rate of movement. This consisted in 
the measurement of the quickest possible time in which a 
person could move the hand through fifty centimeters. 

3. The measurement of the smallest distance between two 
points placed on the skin which can be distinguished as two 
by the individual. This measurement was made by an 
instrument called the esthesiometer. 

4. The measurement of the amount of pressure necessary to 
cause pain. The pressure was exerted upon the forehead 
by a strip of hard rubber. 

5. The measurement of the smallest amount of difference in 
weight which can be discriminated. The measurement was 
made by requiring the subject to lift two weights in suc- 
cession. 

6. The measurement of the quickness with which a person can 
react to a sound. 

7. The measurement of the quickness with which a person can 
name ten specimens of four different colors arranged in 
miscellaneous order. 

8. The accuracy with which a person can bisect a fifty-centi- 
meter line. 

9. The accuracy with which the individual can reproduce an 
interval of ten seconds. The subject responded by giving 
a signal to mark the termination of an interval of time which 
he judged equal to one which was marked off for him by tw9 
previous signals. 

10. Immediate rote memory. The number of consonants spo- 
ken to an individual which he can repeat immediately after- 
ward in series. 


It will be seen that these tests measure acuity of sensation, 
rapidity of movement, simple judgment, and simple memory. 
We shall find it instructive to keep in mind the character 
of these early tests and to compare them with those which 
were employed during the succeeding decade, and with 
those which were developed during the period since 1900. 


2. Early American experiments with tests 
In the few years following 1890, Cattell began to apply 
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mental tests to students in Columbia University. These 
tests were continued in systematic fashion until the end of 
the decade, and were reported upon in a monograph by 
Clark Wissler, in 1901. We shall return shortly to some of 
the results which were reported by Wissler. 

The type of interest in mental tests during this period is 
illustrated by the activities of the American Psychological 
Association. At the instance of Cattell, the Association 
appointed a committee, in 1895, for the purpose of formulat- 
ing mental tests, and of developing a program for their use. 
The committee consisted of some of the most prominent 
psychologists of the country. In 1896, the committee pre- 
sented to the Association a long list of tests. It was recom- 
mended that they be given to college students. They were 
chosen for the purpose of measuring intellectual growth and 
individual differences. It was believed that they could be 
given in one hour. They were, of course, to be given in- 
dividually. 

It is not necessary to reproduce in detail this list of tests. 
They were simply elaborations of the list which had earlier 
been proposed by Cattell in his article in Mind. The gen- 
eral character of the mental capacities which were the 
subject of the test was the same. There were only one or 
two tests of a more elaborate type, which were designed to 
measure the ability to react to a complex situation or to 
carry on the processes which we ordinarily designate as 
judgment, thinking, association, or the higher forms of 
memory for complex materials. 

The committee of the Psychological Association was con- 
tinued for several years, but, so far as the record goes, did 
little more as a committee than to report lists of tests of 
this character. The most elabcrate use of the tests was 
made at Columbia University. A number of other psychol- 
ogists gave the tests to college students or to other persons. 
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For example, during the World’s Columbian Exposition at 
Chicago, in 1893, Jastrow had a booth at which he gave a 
series of tests to persons who offered themselves as subjects. 
Jastrow had previously reported a list of tests which had 
been given college students as early as 1891. Most of these 
early reports, however, contained no statements, or very 
meager statements, concerning the results of the tests. 

A few sporadic tests were given school children during 
this early period. The character of the tests, their aims, 
and their results may be gathered from a few typical illus- 
trations. A memory test, consisting of the measurement of 
the memory span for digits, was given to a group of school 
children by T. L. Bolton, in 1891. The scores which the 
children made in these tests were compared in a very crude 
way with the estimates of their teachers concerning their 
general mental ability. The experiment thus anticipated, 
after a fashion, our modern method of examining and trying 
out tests. 

The results of this comparison are shown in Table I. 


Tas_Le I. Toe RELATIONSHIP BETWEEN TEACHERS’ EstTIMATES 
or Mentau ApiLity AND Memory or Diatrs ! 


Classification by the Memory Test 


Classification by 
Teachers’ Estimates 


On the top line of the,table, opposite the letter A, are 


1T. L, Bolton, ‘The Growth of Memory in School Children”; in Ameri- 
can Journal of Psychology, 1891, vol. 4, pp. 362-80. 
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indicated the children who were rated the highest by the 
teacher. In the middle row, are those who were rated as 
average, and in the bottom row those who were rated as 
poor. In the vertical columns are represented the per- 
centages of the children of these various groups, based on 
teachers’ estimates, who fell into the three groups on the 
basis of the memory tests. For example, to begin with the 
upper left-hand figure, we see that 32.6 per cent of the chil- 
dren who are rated in the top third by the teachers fell also 
into the top third in the memory test, 51 per cent of these 
children fell into the middle third in the memory test, and 
16.3 per cent in the lower third. Evidently, then, there was 
some relationship between the teachers’ estimates and the 
tests, although that relationship was not at all close. If we 
examine the second and third rows, we find that the cor- 
respondence between the memory-test scores and the teach- 
ers’ judgments of the abilities of the children in the two lower 
groups is very slight. 

This method of measuring the relationship between a test 
and some other measure of a child’s ability is, of course, a 
rough and crude method. It will be found to be in sharp 
contrast with the more refined methods which were later put 
into operation. So far as the comparison may be relied 
upon, however, it indicates that the effort to discover a | 
means of testing the child which would agree with the esti- 
mate made of him by his teachers was almost a total failure. 

A somewhat similar comparison between the standing of 
children in certain mental tests and the estimate of their 
ability by their teachers was reported by Gilbert, in 1894." 
This comparison of the tests with teachers’ estimates was 
rather incidental to the main purpose of Gilbert’s study. 


1 J. A. Gilbert, “Researches on the Mental and Physical Development 
of School Children’; in Studies from the Yale Psychological Laboratory, 
vol. 2, pp. 40-100. 1894. 
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His purpose was to measure the growth of children in a 
variety of mental capacities by giving tests to one thousand 
children of different ages. For our present purpose we may 
confine ourselves to his comparison of the standing of three 
groups of children. He asked the teachers to divide the 
children into three groups which should be called bright, 
average, and dull. He gives us the average scores made by 
these three groups in each of the tests. The tests include 
chiefly measurements of reaction time, simple memory, and 
various types of sensory discrimination. We may take, as a 
single example, the comparison of the average reaction time 
of the three groups. They are as follows: 


Bright Average Dull 
20.7 oS 22.4 


If we may take the averages as a reliable indication of 
the differences between these groups, we may say that there 
is a slight difference in favor of the bright children. The 
smaller reaction time, of course, indicates the higher score. 
We must interpret this difference, however, in the light of 
the variations which we find in each group. The average 
mean variation which is reported by Gilbert is 3.6. This is 
over twice the difference between the average of the bright 
group and the dull group. It indicates, therefore, that the 
difference between the groups is so slight in comparison to 
the differences between the individuals in each group that 
it would be at least of no diagnostic value and is of little 
significance of any sort. We must, of course, raise the 
question whether the teachers’ estimates gave a reliable 
classification. 

Certain pertinent questions regarding the technique of 
making these judgments are not answered in the report. 
For example, did the teachers divide the classes into equal 
groups? Did they allow, in making their judgment, for the 
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differences in age of the pupils they compared? Did they 
have an adequate idea of what was meant by brightness — 
did they distinguish between natural ability to do school 
work and the actual attainment of the children, which we 
know do not always agree with native ability. Later experi- 
ments with tests have indicated that matters such as these 
are very necessary to take into account, and in the most 
careful manner, if the statistics which result from such com- 
parison are to be at all relied upon. 

On the face of the returns, the indication is that there is 
very little relationship between such a mental ability as is 
represented in the measurement of reaction time and the 
brightness which is exhibited by children in the school. It 
may be said at once that while the classification of the chil- 
dren into three groups was probably not made as accurately 
as could be desired, the low diagnostic value of such a sim- 
ple test as that of reaction time is confirmed by other later 
experiments such as the one reported by Seashore, in 1899.1 
Seashore gave a number of tests to a group of school children, 
among them tests of sensory keenness. These consisted of 
keenness of hearing, discrimination of pitch, and time mem- 
ory. He reported that there was no correlation between the 
standing of the children in these sensory tests and their 
brightness. A later investigator, to whom we shall have 
occasion to refer at some length, Spearman, recalculated by a 
more elaborate method the relationship between pitch dis- 
crimination and brightness as reported by Seashore and re- 
ports a correlation of .20. This correlation, even though it 
indicates some relationship, is so small as to be of no practical 
importance. That is, a test which has such a low correla- 
tion as this could not be used as a means of diagnosis of in- 
tellectual capacity. 

We may take, as our final illustration of the early applica- 


1C, E. Seashore, Some Psychological Statistics. Univ. of Iowa Studies. 
1899. 
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tion of tests to American school children, a study of the 
relationship between motor ability and marks which was 
reported by Bagley, in 1900.1. Bagley applied tests to 
measure the following characteristics of movement: strength, 
rapidity of voluntary movement, accuracy of voluntary 
movement, steadiness of motor control, amount and charac- 
ter of involuntary movement, and reaction time. ‘The re- 
sults from all of these tests taken together were combined to 
form what was called a motor index. The marks which the 
children made in their school subjects were then averaged 
and taken to represent class standing. 

The comparison between the motor index and the class 
standing was made in the following manner. The motor 
indices were first divided into five equal-sized groups, 
according to rank. The average of each of these five groups 
was then calculated. This gave a descending series of 
averages. The average class standing of the children whose 
scores appeared in each of these five groups was then found. 
The two series of averages are given in the following table: 


A CoMPARISON OF THE GENERAL Motor INDEX AND 
Crass STANDING 


Motorindex. 5... ..26-.. 961.8 938.3 924.3 909.0 881.9 
Classstanding’):.. 0.5. . 77.8 80.0 83.6 83.8 84.7 


The scores were then reclassified in the reverse manner; 
that is, the school marks were first divided into five groups 
according to rank, and the averages of the groups were 
calculated. The average motor index of the children repre- 
sented in these groups, classified on a basis of class standing, 
was then found. The two series are given below. 


Class standing.......... 92.7 87.5 83.0 74.0 67.9 
INKotOPndex. css renner 917.2 907.1 922.8 931.0 931.0 


1 W. C. Bagley, “Mental and Motor Ability”; in American Journal 
of Psychology, vol. 12, p. 193. 1900-01. 
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It will be seen from these comparisons that the children 
who stood high, on the average, in motor ability stood com- 
paratively low in class standing, whereas those that stood 
low in motor ability stood higher in class standing. There is 
apparently what we now call a negative correlation between 
motor index and class standing. Later studies indicate 
that such an opposition between motor ability and school 
marks does not exist. This study, while in comparison with 
other studies of the time it was a carefully conducted experi- 
ment, shows the need of certain precautions in statistical 
procedure which are necessary to observe in making tests 
and in making comparisons from their results. These pre- 
cautions have been discovered by experience, and can be 
said now to constitute a body of technique which is charac- 
teristic of the modern procedure in testing. 

The probable explanation of the negative result of the 
comparison which was made in the study by Bagley is that 
the children who were compared with reference to their class 
standing and to their motor ability differed in age, but were 
not representative of all the children of their respective ages. 
The older children of a class, for example, are known to be 
lower in general academic ability than the younger children. 
This is because the older children are the ones who are re- 
tarded, due to their dullness, and the younger ones those who 
are accelerated because of their unusual brightness. The 
older children, because of their mere age, on the other hand, 
are superior to the younger children in their motor develop- 
ment. This superiority of the older children in motor devel- 
opment, coupled with their lack of superiority or actual 
inferiority in academic ability, produces the appearance of 
inverse correlation or of opposition between motor abilities 
in general and class standing. While later studies have 
indicated that the relationship between motor ability and 
general academic ability is slight, they do not confirm this 
finding of a negative correlation. 


44 MENTAL TESTS 


This study, then, illustrates two points. First, if any 
relationship between motor ability and general academic 
ability exists, it is very slight, and motor ability therefore is 
not a suitable subject of testing if we wish to measure gen- 
eral intellectual capacity. Second, it is necessary to adopt 
the most careful technique in the administration and inter- 
pretation of the results of tests. We shall have to consider 
the demands of technique in considerable detail in the course 
of our later discussion. 

We may now return to the Columbia tests and close our 
account of the experimentation in the United States during 
this early period by a summary of the results of the experi- 
ments reported by Wissler, in his monograph in 1901.1. Be- 
cause of the historical importance of these Columbia tests, 
and in order to indicate in somewhat more detail the character 
of the early tests, we may reproduce the list of the Colum- 
bia tests in full. The list of traits or capacities which were 
measured in the Columbia tests is as follows: 


Length and breadth of head. 

Strength of hand. 

Fatigue as measured by an instrument called the dyna- 
mometer. 

Acuity of vision. 

Color vision. 

Acuity of hearing. 

Pitch discrimination. 

Weight discrimination. 

Discrimination of two points on the skin by the esthesiometer. 

Pain sensation. 

Perception of size. 

Color preference. 

Reaction time. 

The rate of the perception and reaction as measured by the 
rapidity of crossing out a’s in a text. 

* Clark Wissler, Correlation of Mental and Physical Tests. Psychol. 
Monog., vol. 3. 1901. 
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The rapidity of naming colors. 

Rate of movement as measured by dotting in one centimeter 
squares with a pencil. 

Accuracy of movement as measured by striking dots with a 
pencil. 

Perception of time as measured by the ability to follow rhythm 
one second after the sound has ceased. 

Association as measured by free associations to nine words. 

Imagery as measured by the imagery test of Galton. 

Memory as measured by four simple memory tests. 


The memory tests involved the immediate repetition of 
numbers which were seen, the immediate repetition of 
numbers heard, the repetition of a passage, and the ability 
to remember the length of a line which had been seen in the 
early part of the test period. 

It will be seen that the character of the tests is similar to 
the character of the early tests which were listed by Cattell 
in his article in Mind. They are somewhat more elaborate 
in that a larger number of tests are used, but they are nearly 
all tests either of the accuracy of sense discrimination, or the 
strength or rapidity of movement. The last three tests, 
those of association, imagery, and memory, are somewhat 
more complex in nature than the others. They give a sug- 
gestion of the type of test which has since predominated. 
They were, however, very incompletely developed, as com- 
pared with the tests which are in present use. 

The results of the tests which are of most interest to us 
concern the correlation between the various tests themselves 
and the correlation between the standing in the tests and 
college marks. The degree of correlation is expressed in a 
much more accurate fashion than in the earlier studies to 
which reference has already been made. The meaning of 
correlation will be explained more fully in the next chapter, 
but for the present we may merely say that it represents the 
closeness of correspondence between two traits. Correla- 
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tion is represented by a coefficient. This coefficient may 
range from —1 to +1. Perfect agreement is represented by 
+1. If there is no ascertainable relationship between the 
scores in two tests, except what would be present by mere 
chance, the coefficient of correlation is zero. If the relation- 
ship is reversed, so that high standing in one test corresponds 
with low standing in the other, the coefficient becomes 
negative, and if this relationship is the extreme opposite of 
complete correspondence the coefficient becomes —1. 

With this brief explanation, we can understand the signi- 
ficance of the coefficients which are reported. A few may be 
selected by typical examples. We may first mention a few 
correlations between the various tests themselves. 


TaBLe II. CorRELATION BETWEEN CERTAIN OF THE COLUMBIA 


TrEsts 
Reactiomtineand naming colorsinae ae eee eres ee AGS 
Reaction*time and association: < 4.-1se545 2 ee eae .08 
Markingvais andinaming: COlOrS ans aa aie eieereee rereeee 2 | 
Speed of movement and naming colors.................. 19 
Speed of movement and reaction time .................. 14 
Reaction time and marking a’s (approximately) .......... 0 


We see from this list that the relationship between the 
scores in the different tests is very low. The highest ofthese 
correlations, .21, represents only a slight degree of relation- 
ship. The degree of correspondence which is represented 
by such a coefficient is so slight that one could not use the 
score in one test in any practical way to predict what the 
score of the individual in the second test would be. On the 
face of it, therefore, the abilities which are represented by 
the scores in these tests have very little relationship to one 
another. 

We may contrast the results of the mental tests with the 
Monog., varacteristics, height and weight. The correlation 
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between these was .66. This means, for example, that if we 
knew which quarter of the entire group a person belonged 
in with reference to height, we could more often than not 
predict which quarter in reference to weight he would also 
belong in. This, of course, is not an extremely close corre- 
spondence, but it is close enough to make it possible to use 
the scores in one test to predict with enough accuracy for 
certain practical purposes what the score will be in another. 
The correlations between the various tests, however, are not 
sufficiently accurate and related to make this possible. 

The next comparison that we may make is between the 
standing in the tests and in the college classes. The stand- 
ing in each test was compared with the average of the marks 
in all of the courses taken by the individual student. This 
average class standing was found to be correlated with 
the standing in the test, as follows: 


Tasie III. CorreELATION BETWEEN AVERAGE CLASS STANDING 
AND A NUMBER OF MENTAL TEstTsS 


G@Glassistanding and reaction: time... :c06.46 4056. .mie sss or —.02 
lacs standime and marking @’S o.5 soy ce ee ened aise ss —.09 
Class standing and associationtime............:........ .08 
@lass standing and naming Colors, ...5. 0202s edie sn. es .02 
Wlass standing and logical memory... )..22.00 262 ce ees 19 
Class standing and auditory memory.................... 16 


It appears that the tests were not more closely related 
to class standing than to each other. The highest correla- 
tion is between class standing and logical memory, and this 
is negligible so far as the use of the tests for diagnosis is 
concerned. In contrast to these low correlations is the 
correlation between the standing in the various subjects 
themselves and between average class standing and gym- 
nasium work. ‘These correlations are given in the next 
table. 
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Tasie IV. CorRELATION BETWEEN THE STANDING IN THE 
Various COLLEGE SUBJECTS 


atin and Mathematicsinss o+ eictccc ieee ee eee 58 
IRhetoricvand aviathematicsr ery cis cree ene eee near 51 
Rhetoric ang Watin.nce ae: Siete oso tee eee 55 
Rhetoric andobrenchw: see co etn keke ore ae ee ee ee 30 
RhetoricandsGermanss yo.) hs oe i oe re aoe ee 61 
MathematicsiandiG.ernmiani gy cya aitersieiene reenter ene meen 52 
Watineand rence.) as cick & Beda So eee ee 60 
atin and German Ss se-c%..iceacviecs tre aes oe 61 
Patintand!' Greek cts fo nee eo oe ie ee eee 15 
Average class standing and gymnasium grades ........... 53 


It is evident that the low correlation which was found be- 
tween the tests themselves and between the tests and class 
standing is not to be explained by a deficiency in the tech- 
nique of finding correlation. The high correlations between 
the standing in the college subjects precludes this explana- 
tion. The reason for the low correlation must be found in 
the nature of the tests themselves. 

There are two possible explanations of the low correlations. 
They may be due either to the nature of the mental processes 
which are being tested, or to the faults in the technique of 
the organization or administration of the tests. Before we 
can go fully into the discussion either of the content of the 
tests, or of the technique of their organization and adminis- 
tration, it is necessary to discuss these matters more fully 
than is possible at this point. We must content ourselves 
for the present, therefore, with merely hinting at the possible 
explanation of the negative results of these tests. The 
fuller contrast between these earlier tests and the subsequent 
ones will be more thoroughly appreciated after the later 
stages in the development of tests have been described. 

For the present, it is sufficient to say that the low correla- 
tions which are here reported were probably due, in part, to 
the fact that the mental processes which were tested were 
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chiefly the sensory and motor processes. Nearly all of the 
experimental work with tests has demonstrated the fact that 
these mental abilities are not closely related to one another, 
or to the complex activities which comprise achievement in 
the school, or to achievement in vocational activity outside 
the school. The low correlations may also be due in part to 
the fact that the tests must have been given. somewhat 
hastily, since the entire series was completed within an hour. 
It is possible, therefore, that an individual’s score in any 
particular test was not a stable measure of his capacity. 
If the test had been given a second time, his standing might 
have been altered seriously. Since the consistency of the 
tests was not measured, we cannot say how far this supposi- 
tion is correct in this particular case. Later experience, 
however, indicates that the scores in individual tests are 
often not very constant. If this is true, it explains in part 
the low correlations between tests, and between the tests 
and college marks. 


3. Early European experiments with tests 


We have already seen how the interest in mental tests as 
measures of individual differences was the outgrowth of the 
work of psychologists in the experimental laboratory. A 
number of European psychologists were carrying on experi- 
ments with tests during the decade from 1890 to 1900, which 
paralleled, in the main, the experiments of American psy- 
chologists. Among the most prominent of these Europeans, 
both because of the amount of work which he was doing at 
the time and because of the importance of the later outcome 
of his experiments, was the French psychologist, Alfred 
Binet. 

It is instructive to notice that Binet’s earlier work was 
apparently as lacking in immediate productiveness as was 
that of the American psychologists. He was using very 
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much the same kind of tests, and interpreted their results 
with very much the same sort of rough methods. There are 
to be found in his work, however, germs of the characteristics 
which were responsible for the success of his later experi- 
ments. ‘These appear in the practical character of his aims 
and interests, and in some measure in the character of the 
mental processes which he was trying to measure and which 
were represented in the tests of his later scale. We may 
glance briefly at two of the articles in which he reported his 
early work. 

In 1895, Binet! proposed a list of tests, much as did the 
American psychologists of the same period. He did not 
report the result of the applications of the tests, nor did he 
report in detail methods by which the tests could be scored. 
His publication, therefore, was lacking in immediate pro- 
ductiveness. It is interesting, however, as indicating that 
Binet was experimenting with tests somewhat different in 
character from the tests of sensation and of movement which 
characterized the American work. We may illustrate by a 
few examples from the list which he presented. Binet first 
suggested four tests of memory. Two of these were later 
used in his scale. One was the memory of geometrical 
designs, a second tested the memory for a short paragraph, 
and a third tested immediate memory for numbers. <A 
second test was designed to measure the character of the 
individual’s mental images. Another series of tests was 
designed to measure attention, either the uniformity of 
attention or the number of the objects or ideas which could 
be kept in mind at one time. Another group of tests were to 
measure what Binet described as comprehension. Other 
groups were designed to measure suggestibility, esthetic 
feeling, and moral sentiments. 


‘A. Binet and ‘V. Henri, “La psychologie individuelle”; in Année psy- 
chol., t. 2, pp. 411-65, 1895, 


EARLY EXPERIMENTATION WITH TESTS 51 


It will be seen that Binet’s proposals, in contrast to those 
of some of his fellow psychologists, were very vague, and that 
the exact means by which the abilities which he described 
were to be tested had not been worked out by him. He had 
not as yet developed a technique to measure some of the 
functions which he listed. The success of the testing move- 
ment, however, has been due in part to the efforts to meas- 
ure some of these more complex mental processes, in 
contrast to the simpler ones studied in the early American 
work. 

Binet himself experimented with a number of tests of 
the simpler type, which were devised and given by him for 
the purpose of measuring attention and adaptation.! The 
practical cast of Binet’s mind was indicated by the fact that 
he gave these tests to two groups of children, six being the 
poorest from a class of thirty-two, and five being the best. 
He selected these two groups in order that he might deter- 
mine which tests served to differentiate the bright from the 
dull pupils. 

The first test measured tactual discrimination by means 
of the esthesiometer. ‘This test was given in three forms. 
In the first form the bright children excelled the dull ones. 
In the last form, however, which was somewhat more ac- 
curate and in which there had been opportunity for some 
practice, the difference between the two groups was very 
slight. The second test was a measure of reaction time. 
Binet’s results agreed with those of Gilbert in that there was 
little difference between the scores of the two groups of 
children. The next test consisted of counting small points 
placed close together. Here again the difference between 
the two groups was small or non-existent. In the next 
test the child listened to the count of the beats of an instru- 


’ 


1A, Binet. ‘Attention et adaptation”’; in Année psychol., t. 6, pp. 248- 


404. 1899. 
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ment called the “beater.” He was required to tell whether 
or not, at a predetermined given time, the rate of the beater 
was changed. In this test the dull children exceeded the 
bright ones. This result was explained by Binet as due to 
the fact that the children knew when to pay attention. He 
believed if the warning had not been given, the bright chil- 
dren would have exceeded the dull ones. 

In the remaining tests the bright children, in general, 
excelled. The first one consisted in counting the beats of a 
metronome. Here the bright children made higher scores 
than the dull ones, though the dull children improved more 
than the bright children. The next test consisted in copying 
various sorts of printed material, and the measure was the 
amount which could be copied at one act of observation. 
The bright children in each case excelled the dull children in 
this test. The bright children also made fewer errors than 
the dull ones in reproduction of letters or numbers from 
memory. In the reproduction of a design seen for a brief 
space of time, the bright children gave better reproduction 
than the dull ones. In the test of crossing a’s, the dull 
children made more errors than the bright ones. A similar 
result came from the test in simultaneous adding. The 
speed of the two groups was approximately the same, but the 
number of errors was greater in the case of the dull children. 
In speed of reading and copying sentences, the dull children 
were equal to the bright ones. 

This experiment would, of course, not at all meet our 
present statistical standards for an experiment in tests. Its 
results can be regarded as merely suggestive. If we take the 
results of the comparison of the two groups of children as a, 
whole, we see that those tests in which the mental processes 
were more complex did, in general, differentiate the two 
groups better than those in which the mental processes were 
simpler. ‘This undoubtedly impressed Binet and led him to 
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the selection of the more complex type of tests for his later 
scales. 

Little work of any note, aside from that of Binet’s, was 
done in Europe during this early period. Some experimen- 
tation with tests was stimulated by Kraepelin, who was in- 
terested primarily in means of diagnosing insanity. A. 
Oehrn,' in 1895, published a report of a few tests and gave 
data from which their correlation could be calculated. 
These tests dealt with perception, which was measured by 
counting letters, crossing letters, and proof-reading; with 
memory; with simple association processes, such as are re- 
quired in adding; and with several motor functions, such as 
writing from dictation and reading simple material. Krue- 
ger and Spearman calculated the correlation between the 
scores in these tests, and found them to range from .44 to 
.69, aside from the cases in which no correlation was found. 
This experiment is interesting merely because it was repre- 
sentative of the earlier work, and proved somewhat more 
successful than some of the other experiments which were 
made at this time. 

A marked exception to the unsuccessful attempts of the 
earlier tests was the invention of the so-called “completion 
test’ by Ebbinghaus, in 1897.2, Ebbinghaus aimed prima- 
rily to find a measure of intellectual fatigue, and he there- 
fore selected a mental process which he thought would re- 
present the higher intellectual activities. His analysis led 
him to believe that the combining activity of the mind was 
the highest. In order to measure this activity, he devised a 
test in which the subject was shown a text with certain words 
left out. He believed that if the individual had the capac- 

1A. Oehrn, “Experimentelle Studien zur individuelle Psychologie”’; in 
Psychol. Arbeiten, B. 1, pp. 92-151. 1895. 

2H. Ebbinghaus, “Ueber eine neue Methode zur Priifung geistiger 


Fahigkeiten und ihre Anwendung bei Schulkindern”’; in Zettsch. f. Psychol. 
B. 13, pp. 401-59. 1897. 
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ity in general to put together the items of his experience in 
such a way as to see their relationship, this capacity would be 
measured by his score in filling in these blanks. While his 
device did not prove very successful as a measure of general 
intellectual fatigue, it did prove valuable as a measure of 
general intellectual capacity. It was tried out with children 
of different levels of school achievement in Germany, and 
has later been very extensively used by European and 
American psychologists. 

The work of Francis Galton in the investigation of indi- 
vidual differences, for the purpose of tracing the inheritance 
of mental characteristics, has already been mentioned. 
While this work did not have much direct influence upon 
testing in the schools, it did have a large amount of indirect 
influence. It is interesting to notice in this connection that 
Galton’s successor in the Eugenics Laboratory in London, 
Karl Pearson, formulated a method of calculating correla- 
tion which is now in wide use in the field of mental tests. 

The interest in individual psychology was also promoted 
by the work of William Stern.!| While Stern did not, in this 
early period, develop mental tests, he made a study of the 
intellectual characteristics of prominent men — men who 
were known for unusually high intellectual achievement in 
some line of endeavor. He attempted to analyze the spe- 
cial capacity which was possessed by these men by the bio- 
graphical and questionnaire methods. Stern later be- 
came interested in the application of tests of intellectual 
ability. 

We have now reviewed briefly the typical tests which are 
characteristic of the early period, which closed about 1900. 
We may summarize briefly the outstanding characteristics 
of these early tests. 


1W. Stern, Ueber Psychologie der individuellen Differenzen. 1900. 
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4. Summary of the early period 

The interest in tests during this period was largely theo- 
retical. It was the outgrowth of the work of the psycholo- 
gist in the psychological laboratory. It was related to the 
general interest in individual differences, and this again was 
related to the inheritance of mental traits. There were, 
to be sure, some experiments in the application of tests to 
school children, and these experiments foreshadowed, in a 
measure, the practical applications which have been made 
of tests during the past two decades. 

Most of the tests of the early period, as has already been 
pointed out, were single tests. ‘They were not organized 
into scales. If a number of tests were given at the same 
time, the scores which were made in these tests were not 
combined. This again constitutes a marked contrast be- 
tween the earlier tests and most of those which are now in 
use. 

In the next place, the early tests were not standardized in 
the sense which we use the word standardized in the present 
time. No careful method was used to determine whether or 
not the tests were reliable. Reliability may depend upon 
the way in which the test is given, or the way in which the 
response by the individual is scored, or upon the general 
conditions under which the test istaken. We now have care- 
ful methods of determining whether or not a test is thor- 
oughly standardized. Furthermore, certain practices have 
been determined upon as constituting good standardization. 
In this respect again, then, the early tests are in contrast 
with those which were later developed. 

In addition to the fact that there was no careful procedure 
by which the reliability of a test was determined, there was 
no well-recognized method for accurately determining the 
significance of a test by comparing the scores made in it with 
other measures of achievement. Some comparisons, to be . 
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sure, were made, such, for example, as those of Bolton, of 
Gilbert, and of Binet, but these comparisons were made on a 
small scale in many cases, and without the elaborate statis- 
tical technique which is now customary. 

So far as the content of the early tests was concerned, it 
dealt mostly with the sensory and the motor processes. In 
some cases simple tests of memory were used, and in a few 


cases, such as some of the tests of Binet, and the Ebbinghaus _~ 


test, the higher mental processes were included in the meas- 
urement. These, however, are exceptions to the general rule. 

As there was no highly organized method of comparing 
the results of tests with other methods, so the method of 
systematic comparison of results was not used as a means of 
selecting the tests. The selection of tests, on the contrary, 
was based largely upon a preliminary analysis of the mental 
process which it was desired to measure. This preliminary 
analysis is not to be criticized in itself. It has, however, been 
shown to be inadequate as a sole method or criterion for 
selection. It is probably true that the further advancement 
of tests will depend in a measure upon a more acute analysis 
than we have been able to make at this time of the mental 
capacities. However, this analysis must be backed up by a 
careful statistical examination of the result. 

Finally, the results of the early tests, so far as we may 
judge by the comparison with other measures of achieve- 
ment, were for the most part negative. This negative out- 
come is illustrated, for example, in Wissler’s report of the 
Columbia tests. The consequence of this negative outcome 
is that tests for a time fell into a distinct disfavor on the 
part of professional psychologists. There was some experi- 
mentation with tests during the succeeding few years, but 
this was sporadic and did not elicit the interest of psycho- 
logists as a body. The type of prevailing interest on the part 
of professional psychologists is well illustrated by the ap- 
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pointment of a new committee on tests, in 1906.1. This new 
committee, in contrast to the earlier one, took as its purpose 
not the organization of an elaborate series of tests, but the 
minute standardization of a few tests of simple sensory and 
and motor processes. The work of this committee made a 
valuable contribution to the measurement of these simple 
processes, but did not contribute towards the development in 
_ the direction of testing the more complex or the higher 
mental activities. 

In the next chapter we shall discuss the further develop- 
ment of single tests, particularly as it is related to the cor- 
relation technique. 
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CHAPTER III 
THE APPLICATION OF THE CORRELATION METHOD 


Mernops of investigating correlation had been used in the 
early period of the development of tests. These methods, 
as has already been pointed out, were crude, and they had 
the defect that it was not possible to establish the degree of 
correlation by means of a single numerical quantity. As we 
have seen, there was one investigation in which the coeffi- 
cient of correlation was calculated, namely the study by 
Wissler. In this case, the correlation was calculated only 
after the tests had been given. It was not used in the de- 
velopment of the tests themselves. 

We may pause for a moment to explain in brief the gen- 
eral meaning of correlation. It expresses the degree of 
correspondence between two traits, such, for example, as 
height and weight, or musical ability and artistic ability. 
It is only possible to measure such relationships by a com- 
parison of the amount of ability possessed by the various 
individuals of a group. In other words, correlation is en- 
tirely a comparative affair. 

We may take as a simple example the two characteristics 
of height and weight. If all the individuals of a group were. 
compared in both these respects, it would be found that on 
the whole the taller persons are also the heavier. If, now, 
this correspondence were complete there would be a perfect 
correlation between height and weight. One way of ex- 
pressing this correspondence is to say that the tallest person 
of the group is also the heaviest, and the shortest person the 
lightest. This expresses the relationship in terms of rank- 
order. We may, however, make a somewhat more exact 
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comparison by calculating the average height and weight 
of the members of the group, and then representing the 
height and weight of each individual in terms of the degree 
of variation above or below the average. If, now, an indi- 
vidual varies from the average in height by an amount which 
corresponds exactly to his variation from the average in 
weight, and this is true of all the individuals in the group, 
we say that the correlation is perfect. 

It might, of course, be a priori possible that we should 
find two traits in which there was not only a lack of positive 
correspondence, but even an inverse relationship. If the 
persons who stood at the top of the series in reference to one 
trait stood at the bottom in reference to the other, and if all 
the other individuals stood in this relation of opposition, we 
would then say that there was a complete inverse relation- 
ship or negative correlation between the two traits. This 
represents the other extreme. 

Many degrees of correlation may exist between these two 
extremes. There may be a high degree of positive correla- 
tion, but not a perfect one; there may be a high degree of 
negative correlation; or there may be low degrees of negative 
or positive correlation. If no determinable relationship 
exists of either positive or negative character, we must con- 
clude that the relation between the two traits is one of 
chance. 

It is not the place here to enter upon the description or 
the discussion of the means of calculating the correlation 
between traits. We may simply refer again to the quan- 
titative expressions which are used to represent the degree 
of correlation, and which'were given on page 46. The fore- 
going statement may suffice to indicate the nature of cor- 
relation for the purpose of interpreting its use in the develop- 
ment of mental tests. 

We have set in contrast the earlier and the later tests on 
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the ground that the former were interpreted by means of 
rough or crude methods of finding correlation, whereas in 
the latter the interpretation was based upon more refined 
methods. Another distinction between the two periods is 
that while, in the case of the earlier tests, correlation was 
applied after the tests had been given, and for the purpose 
of interpreting results, in the later period correlation was 
also applied while the tests were being developed, in order 
that it might be determined which tests were good and which 
were poor. In other words, correlation became a part of the 
technique in the design and organization of a test. 

By means of correlation it is first determined whether a 
test gives consistent results. For this purpose the scores on 
the test are correlated with the scores in the same test given 
a second time. The coefficient which results is called the 
reliability coefficient. In the second place, the signifi- 
cance or meaning of the test is examined by finding the 
correlation between the scores on this test and the scores on 
other tests. This indicates to what extent the various tests 
measure the same capacities. This is a factor in the inter- 
pretation of the meaning of the test scores and in the design 
of composite scales. Finally, the test is examined by finding 
the correlation between scores in the test and some outside 
measure altogether. We shall see how these criteria are 
applied in greater detail in the course of the discussion. 


1. Spearman’s criticism of statistical procedure 


The advance to the more precise method of standardizing 
tests and of calculating their results is represented in the 
writings of Charles Spearman. Spearman, in 1904, pub- 
lished an article! entitled ‘‘‘General Intelligence’ Objec- 


1C, Spearman, “‘General Intelligence’ Objectively Determined and 


Measured”; in American Journal of Psychology, vol. 15, pp. 201-92. 
1904, 
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tively Determined and Measured.” In this article he re- 
viewed in a critical way the previous tests, outlined the main 
problems for study, and indicated the technique by which 
these problems might be attacked. His criticism of the 
previous work is summed up under four heads. It was 
defective, he says, first, because the investigators failed to 
use precise quantitative expressions to represent the degree 
of correlation between tests, or between tests and other 
measures. The previous work failed, in the second place, 
because it did not include a calculation of the probable error 
of the correlation. In the third place, it did not eliminate 
certain irrelevant or falsifying factors which might give a 
misleading correlation which was too high or too low. 
Finally, in the fourth place, it did not allow for errors ir 
observation. We may review briefly each of these criti- 
cisms. 

Spearman did not, himself, as we have already seen, invent 
the mathematical formula for calculating correlation. The 
formula had already been devised by Karl Pearson, who 
modified a previous method developed by Bravais. This 
method had already been used by Wissler, and by Aiken, 
Thorndike, and Hubbell. Pearson’s method was called the 
products-moment method. 

While the products-moment method can be applied by 
one who knows no higher mathematics, it does require rather 
elaborate calculation. Spearman contributed to the ease of 
finding correlation by presenting two simpler formule. One 
of them is called Rank Method, and is a method of finding 
correlation by ranking instead of by calculating the varia- 
tions of individual scores from the average. A shorter or 
Footrule Method is dependent also upon the procedure of 
ranking. These two simple formule of Spearman have been 
very widely used where the number of cases is rather small 
and one does not desire a very precise calculation. 
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Spearman’s contribution, however, is not so much in the 
production of these formul as in the emphasis which he 
placed upon the need of precise methods of calculation. The 
rougher methods, such as were used by Bolton, Gilbert, and 
Bagley, can be used to determine whether a marked degree 
of correlation exists; but to determine whether there is more 
or less correlation between two pairs of traits, or precisely 
what the degree of correlation is, it is necessary to express 
the degree of correlation in a single numerical quantity. It is 
this which the methods of Pearson and Spearman enable us 
to do. 

The expression of the degree of correlation between traits 
in a single coefficient also made possible the comparison of 
the degrees of relationship between the various pairs of a 
whole group of traits. This led to the study of the interrela- 
tionship between mental capacities on a large scale, and also 
to the attempt to interpret the cause of the interrelation- 
ships which are found to exist. Spearman introduced this 
study, in association with Krueger, in an article which ap- 
peared in 1906.1 He was followed rapidly by several other 
investigators who made a larger number of tests. We shall 
have to consider the several studies which grew out of this 
investigation in a later section. 

Spearman emphasized, in the second place, the necessity 
of calculating the probable error of the coefficient of corre- 
lation, or, in more general terms, of determining how reliable 
a coefficient is. Pearson’s formula for calculating the degree 
of probable error of a coefficient was already in existence, but 
little use had been made of it. There are two main factors 
which affect the probable error of the correlation coefficient. 
In the first place, the number of cases is an important factor. 


1f, Krueger and C. Spearman. “Die Korrelation zwischen verschied- 
enen geistigen Fijhigkeiten”; in Zeitschrift f. Psychol. B. 44, pp. 50-114, 
1906. 
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If there are only a few persons in the group for which the 
correlation is found, chance plays a large factor in the size 
of the coefficient. To put it in another way, if another group 
of the same size is tested and the correlation between the 
two tests found, the probability is very large that it will 
differ considerably from the correlation of the first group. 
The error which is caused by using a small number of cases is 
called the error of sampling. 

The second factor which affects the size of the probable 
error is the size of the correlation coefficient itself. The 
larger the coefficient is, the smaller the probable error will 
be. The size of the probable error, then, indicates how 
much we can rely upon a particular coefficient as being a 
stable measure of the relationship between the traits which 
are compared, so far as the merely statistical factors affect 
it. It is now customary always to calculate the probable 
error of a coefficient of correlation, and to report it with 
the coefficient itself. If the correlation coefficient is not 
at least four or five times as large as the probable error, 
one should place no reliance upon the coefficient. 

The converse of this statement, however, is not always 
true. We cannot, in every case, rely upon a correlation 
coefficient which meets these demands of statistical reliabil- 
ity. Because a coefficient is four or five times the probable 
error does not mean that we can expect to get a coefficient 
within the range of the probable error in half the cases when 
we calculate the correlation between the same two traits 
in the case of another group of persons. There are other vari- 
able factors which affect the degree of correlation other than 
those of a statistical nature. For example, the army test 
has been applied in a large number of academic institutions, 
and the correlation has been found between the scores in the 
army tests and in the marks of college classes. The amount 
of correlation thus found has varied all the way from about 
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3 to.7 or higher. We find, when we come to review 
the results of the study of the correlation between men- 
tal traits, that the variations in the coefficients which are 
found are confusingly large. These variations make it 
necessary to be very cautious in the interpretation of the 
results of correlation, even though we may carefully follow 
the most reliable statistical method. They make it un- 
safe to place much reliance upon any single correlation. 
It is necessary to base our interpretations upon the gen- 
eral trend of correlation coefficients rather than upon any 
single measure. 

The causes of still other variations among correlation 
coefficients, which cannot be accounted for by an inadequate 
number of cases, are touched upon in Spearman’s remaining 
criticisms. He says, in the third place, that the calculated 
coefficient may be affected by other factors than the real 
relationship between the traits which are measured. It may, 
for example, be raised or lowered by kinship between the 
individuals who are tested, by differences or likenesses of 
the social level of the individuals, and possibly by differ- 
ences in attitudes or abilities which affect the score but 
which are not the thing which it is desired to measure, such 
as zeal, endurance, or manual dexterity. These may either 
produce an apparent correlation when none exists between 
the traits, or reduce the correlation coefficient below the true 
measure. 

Let us take age as an illustration of these irrelevant fac- 
tors, because it is clear that it does frequently affect test 
scores in such a way that it will produce an error unless it is 
properly accounted for, and because it is a factor which is 
often overlooked. Table V! illustrates the way in which 
age may increase the apparent direct correlation between 


1 From a Master’s thesis by H. W. Nutt, entitled Rhythm in Handwrit- 
ang. Library of the University of Chicago. 1916. 
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Taste V. Scarrer DiacRAM To SHOW THE EFFeEcT or A Con- 
STANT Factor (AGE) In Propucine a SpuRIOUS 
CORRELATION 


(The X’s represent seven-year-old children, the O’s ten-year-old children, 
and the V’s fourteen-year-old children.) 
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quality and rhythm in handwriting. The score of these two 
characteristics rises as the child grows older. The children 
who are represented in the diagram are divided into three 
age groups, of seven, ten, and fourteen years respectively. 
Those of seven years of age are designated by X’s, those of 
ten years by O’s and those of fourteen years by V’s. We see 
that the X’s, which represent the younger children, are 
grouped toward the lower left-hand side of the table. This 
means that they make low scores in rhythm and also in 
quality. The V’s, representing older children, are grouped 
toward the upper right-hand corner of the table, showing 
that they make high scores in both rhythm and quality. 
The 0’s fall in between these two groups. 

The second fact which is noticeable in the table is that, if 
we take the symbols representing all of the individuals, 
without reference to the age distinction, we see that they 
form a group which is elongated along a diagonal line run- 
- ing upward and toward the right. This indicates correla- 
tion, because it means that those who are low in one test are 
low in the other, and those who are high in one are high in 
the other. 

If now, in the third place, we look at each of these age 
groups by itself, we see that the symbols are not grouped 
along such a diagonal line. The X’s alone, or the O’s or 
V’s alone, are scattered promiscuously over a given area of 
the diagram. It is only when we include in our view the 
three age groups that we find the diagonal grouping to 
obtain. It is clear, therefore, that there is no marked direct 
relationship between quality and rhythm, but that, because 
each one is affected by age, there is an indirect relationship 
between them when we compare children of various age 
groups together. An indirect relationship of this sort pro- 
duces what is called a spurious correlation. 

Spearman pointed out that such factors as age may serve 
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to increase the correlation between two traits or to decrease 
it. The increase is produced when the common factor 
affects both traits alike, and a decrease is produced if it 
affects one and not the other. In addition to pointing out 
this fact, Spearman presents a formula which is for the pur- 
pose of determining what the real correlation would be if the 
irrelevant factor did not exist. That is, the formula gives a 
corrected coefficient which is derived from the correlation 
between the two processes and between each process and 
the irrelevant factor.!. His formula has been superseded 
by a formula for calculating partial correlation — that 
is, the correlation between two factors which are affected 
by a third factor, if the third factor is assumed to remain 
constant. 

Spearman’s criticism was important because it called 
attention to the complication of factors which affect a cor- 
relation coefficient. It is this fundamental point which is 
important. It puts us on the watch for spurious factors. 
When they have been discovered their effect may be elimin- 
ated by the application of the partial correlation formula, or 
it may be avoided by the initial selection of the persons who 
are to be tested. For example, the disturbing effect of age 
may be avoided by choosing children who are all of the same 
age. This is perhaps the preferable procedure when a large 
enough number of cases can be secured. 

Spearman’s fourth criticism had to do with the effect of 
chance factors or errors in measurement or observation upon 
the correlation coefficient. He pointed out that if the test 
itself is inaccurate, or if both tests are inaccurate, the ap- 
parent coefficient is lower than it would otherwise be. The 
coefficient which is found by using the inaccurate raw scores 
he called the raw coefficient. He devised and presented a 


1 This description fits the 1904 article. Spearman subsequently modified 
the formula. 
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formula for determining what the true relationship would be 
if the tests themselves were accurate. ‘This he called the 
true correlation. The application of this formula is called 
correction for attenuation. The assumption underlying the 
correction formula is that the true correlation will be higher 
than the raw correlation in proportion as the original tests 
are inaccurate. 

Here again Spearman’s criticism is probably more im- 
portant than his correction formula. Certainly, from the 
point of view of individual diagnosis, it is of no value to 
know that the correlation would be higher if the tests were 
more accurate. If the test is inaccurate the ratings 
of individuals will be unreliable. Whether the formula 
has a greater theoretical than practical importance may 
be a debatable question. At any rate, from the prac- 
tical point of view, the chief importance of this criticism 
is in calling attention to the necessity of so perfecting our 
tests that they are accurate measures of the trait which 
they do represent. If a test is found to be unstable, the 
correct procedure is to perfect it until it does give consist- 
ent results. 

The method of determining the accuracy of a test is to 
give it twice, and then find the correlation between the two 
sets of scores. This is called the reliability coefficient. If the 
reliability coefficient is high, the test may be relied upon as 
giving consistent results. If it is low, the results are incon- 
sistent, and no very sound conclusions can be drawn from 
the tests either with reference to individuals or with refer- 
ence to the magnitude of the real correlation. 

This last criticism of Spearman’s has been influential in 
leading to a refinement in the methods of giving tests which 
comprises a considerable part of what we call standardiza- 
tion. The necessity of such standardization is represented, 
for example, in Whipple’s Manual of Mental and Psychical 
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Tests.' In this book, which was published in 1910, are de- 
scribed in detail the methods of giving some fifty single 
tests. These methods have been carefully worked out and 


Fic. 1. Tur SeasHore AUDIOMETER 


This is an illustration of one of the instruments which represent the standardization of tests. 
Many such instruments are described by Whipple. (Courtesy of C. H. Stoelting Co.) 


are minutely described. The purpose of this standardiza- 
tion of methods of giving tests is to make the score constant 
— to insure that, on successive tests, it will be a measure of 
the same thing. 


2. Correlation studies of single tests 
The critical discussion of Spearman stimulated a number 
of intensive and elaborate studies of tests, particularly with 


1G. M. Whipple, Manual of Mental and Physical Tests. Baltimore: 
Warwick & York, 1915. 
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reference to correlation. The first, and one of the most 
careful of these, was made by Cyril Burt,' in 1909. We may 
take this as typical of the studies on this subject. 

Burt gave a series of twelve tests to two groups of boys, 
all of them about twelve years of age. He also secured from 
the teachers an estimate of the general mental capacity or 
intelligence of the boys. One of the groups was composed of 
students in a superior elementary school in England, and the 
other in a preparatory school. Burt’s tests included a num- 
ber which were characteristic of the earlier period of tests, 
namely sensory discrimination and motor ability. He also 
gave two sensori-motor ability tests, which required more 
complex response than the motor tests, several tests of 
association, and one which he called a test of voluntary at- 
tention. 

These tests, classified according to Burt’s description, are 
as follows: 


Sensory discrimination: 
(1) Esthesiometer test — discrimination of two points on 
skin. 
(2) Weight discrimination. 
(8) Pitch discrimination. 
(4) Discrimination of length of lines. 
Motor tests: 
(5) Tapping. 
(6) Dealing cards. 
(7) Card-sorting. 
(8) Alphabet-finding. 
Association tests: 
(9) Immediate retention. 
(10) Mirror drawing. 
(11) Spot pattern. 
Voluntary. attention: 
(12) Dotting. 


1C. Burt, “Experimental Tests of General Intelligence”; in British 
Journal of Psychology, vol. 3, pp. 94-177. 1909. 
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The card-sorting tests differed from the dealing tests in 
this respect. In dealing cards they were put into piles with- 
out regard to the character of the cards themselves. In 
sorting, they were sorted according to the numbers on the 
cards. This involved a discriminative response each time a 
card was thrown into a pile. The alphabet-finding test was 
carried out in this way. A number of cards, each containing 
one letter of the alphabet, were placed before the subject 
in an irregular arrangement. The task was to pick out 
the letters in the alphabetical order and arrange them be- 
neath the original group. The mirror-drawing test required 
the subject to draw a figure which was seen in a mirror and 
therefore reversed. In the spot-pattern test the experi- 
menter showed, for a very short time, a figure marked off 

-into squares. At various intersections of this figure were 
dots. The individual was then given a card which had sim- 
ilar lines upon which he was required to place dots in the 
same position as in the card shown him. In the dotting test 
the individual being tested was seated before a rotating disk, 
covered, except for a narrow slot, by a card. On the disk 
were placed dots in irregular positions. The individual was 
required to strike these dots with a pencil as they appeared 
one at a time through the slit. This required alertness and 
a rapid adjustment of the movement of the hand to the posi- 
tion of the dot. 

Burt worked out the technique of the administration of 
each of these tests with considerable care, and, in order to 
determine how consistent or reliable the scores of each test 
were, he gave each one twice, and then calculated the cor- 
relation between the two sets of scores for each group. 
These reliability coefficients varied considerably. . The 
lowest was .38, which was found in the card-sorting test in 
the preparatory school group. The highest was .93, in the 
test for memory of words and syllables, also in the prepara- 
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tory group. Eleven of the twenty-two reliability coeffi- 
cients which were reported were below .70, and the other 
eleven were .70 or above. It will be remembered that a 
coefficient which expresses positive correlation may vary 
from zero to one. A correlation which approaches one there- 
fore, is high, and one which approaches zero is low. 

It is a matter of judgment as to how high a reliability co- 
efficient must be in order that the test may be regarded as 
satisfactory. Certainly, a coefficient as low as .50 repre- 
sents little reliability. Perhaps we may say that .70 is the 
lower permissible limit of such a coefficient. Anything 
below this means that the test gives very variable results 
when repeated, and one can place but little reliance upon a 
single score even with a correlation of .70. We may, how- 
ever, consider this as fairly satisfactory. We see, then, that 
about half of Burt’s tests met the requirements so far as reli- 
ability is concerned. 

Burt used these reliability coefficients also to calculate 
the true scores according to Spearman’s formula, which has 
already been mentioned. We shall not be concerned with 
these, but shall use only his raw scores. 

The next question we may ask concerning Burt’s result4s, 
which of the tests gave high correlations with the other tests 
in general, and which ones gave low correlations? The 
results agree with the conclusions which we have already 
reached from the earlier experiments. The tests of discrim- 
ination and of motor ability have little relationship to the 
other tests, or to each other. The tests which Burt desig- 
nates as measures of association or of voluntary attention 
are the ones which have the highest correlation among 
themselves, and with other tests in general. The order in 
which the tests fall, as measured by the degree with which 
they correlate with other tests, is as follows: 
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ELEMENTARY SCHOOL 
Doitting test 
Alphabet test 
Card-sorting 
Card-dealing 
Spot pattern 
Tapping 
Mirror 
Pitch discrimination 
Discrimination of lines 
Touch discrimination 
Memory 
Discrimination of weight 


PREPARATORY SCHOOL 
Dotting test 
Alphabet test 
Muror test 
Memory 
Spot pattern 
Tapping 
Sorting cards 
Pitch discrimination 
Discrimination of lines 
Weight discrimination 
Touch discrimination 
Card-dealing 


In a few cases the order in which the particular tests fall 
differs rather widely in the two lists. This is true notably of 
the card-sorting, the card-dealing, the mirror test, and the 
memory test. On the whole, however, the two lists corre- 
spond very well, which means that the tests which show a 
high correlation with other tests in one group also show a 
high correlation in the other group. The correlation be- 
tween the order of the two lists of tests is .80. 

We may now examine the tests from a point of view of the 
extent to which they correlate with estimates of intelligence 
made by the teachers, or imputed intelligence, as Burt calls it. 
The order in which the tests fall, based upon the closeness of 
their correlation with imputed intelligence, is as follows: 


ELEMENTARY SCHOOL 
Spot pattern 
Mirror 
Alphabet 
Dotting 
Memory 
Sorting 
Tapping 
Dealing 
Pitch discrimination 
Discrimination. of lines 
Touch discrimination 
Weight discrimination 


PREPARATORY SCHOOL 
Dotting 
Alphabet 
Memory 
Spot pattern 
Sorting 
Mirror 
Tapping 
Pitch discrimination 
Dealing 
Discrimination of lines 
Touch discrimination 
Weight discrimination 
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There is a rather close agreement between the order of the 
tests in the two schools as judged by this criterion, and also 
between the order based on the intercorrelation between tests 
and that based on the correlation with imputed intelligence. 
In other words, the tests which agree closely with the esti- 
mates by the teachers of the pupils’ abilities, also agree 
closely, in general, with the other tests. 

This method of rating a test on the basis of the closeness 
of its agreement with other tests, or with imputed intelli- 
gence, suggests a more elaborate comparison and an inter- 
pretation of the constitution of mental ability. This more 
elaborate comparison is made on the basis of a table which 
shows the correlation of every test with every other test. 
For brevity, we shall study only the table which was derived 
from the results of the elementary school group, Table VI. 

Table VI is arranged in this fashion. Every test is re- 
presented by a horizontal row of coefficients and by a vertical 
column of coefficients. Thus, the top row and the first 
column contain the coefficients from the dotting test. The 
next row and the next column represent the alphabet test. 
The third row and the third column represent the results 
from the correlation of imputed intelligence. The table is so 
constructed that there is a place for the coefficient of correla- 
tion between every test and every other test. By following 
each horizontal row to the point of its intersection with a 
given vertical column we may find the correlation between 
the tests which are represented in the given row and column. 
One coefficient in each row represents the correlation of a 
test with itself, or the reliability coefficient. The reliability 
coefficients are enclosed in heavy lines. 

Tables of the intercorrelation of mental tests of this sort 
form the basis of much discussion concerning the nature 
and the relationship of mental capacities. Spearman and 
Krueger, in their earlier studies, presented tables of this sort, 
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but they were not extensive enough to serve as satisfactory 
evidence upon the problem. Burt was the first investigator 
to furnish adequate material for the construction of such a 
table. 


Taste VI. Tur INTERCORRELATIONS OF THE TEsts GIVEN BY 
Burt To THE Boys or THE ELEMENTARY SCHOOL 


(The numbers at the head of the columns stand for the same tests as are 
named after these numbers on the left side.) 


ees 
1. Dotting 


2. Alphabet- 
finding “77 


Card- 
sorting 67 


3. 


4. Imputed 
intelligence | .60 


5. Card- 
dealing 69 


6. Spot 
pattern 57 


7. Tapping 57 


8. Minor 
drawing 


. Pitch dis- 
crimination 


. Line dis-_ 
crimination 


. Touch dis- 
crimination 


. Memory 


13. Weight dis- 
crimination 


The order of the tests is based upon the average degree 
of correlation between the individual tests and all of the 
others. The test having the highest degree of correlation is 
at the top and the one having the lowest is at the bottom. 
The fundamental observation which is made concerning a 
table of this sort by Spearman and Burt is that there is a 
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kind of consistency about the relationship between the 
mental traits which is significant. This consistency is 
shown by the fact that those tests which have high correla- 
tion on the average have relatively high correlation with 
each of the individual tests. The dotting test, for example, 
has a higher correlation with the alphabet-finding test than 
has any of the others. Its correlation with the card-sorting 
test is higher than that of most of the other tests.. In similar 
fashion, it correlates to a high degree with imputed intelli- 
gence, with card-dealing, with the spot-pattern test, and so 
on. This rule has exceptions, but we shall pass them over 
for the moment. 

This consistency in the degree of intercorrelation between 
the various tests is expressed in the term the hierarchy of in- 
telligences. By hierarchy of intelligences is meant that 
situation in which some mental traits are superior to others 
as measured by the degree of their correlation with other 
mental traits. They stand higher on the scale as measured 
by intercorrelation. Other traits, on the other hand, stand 
lower on the scale as measured in this same fashion. This 
fact that a hierarchy exists, so that some capacities are cor- 
related relatively closely with all other capacities, while other 
capacities have in general only slight correlation with others, 
has led Spearman and his school to set up the hypothesis of a 
central factor which is shared in large measure by some capa- 
cities or traits and in small measure by others. This central 
factor is regarded as responsible for the correlation between 
tests. It was originally called general intelligence, but is 
now called by Spearman merely “G,”’ so as not to confuse it 
with other meanings of intelligence. A fuller discussion of 
the evidence for “G” will be presented in the chapter on 
“The Nature of Intelligence.” 

We may refer to one additional study on correlation, in 
order to get further light upon the problems with which it 


/ 
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deals. The study which is selected for this further reference 
is by Simpson.! Simpson gave a series of fifteen tests to two 
groups of persons. These persons were selected so as to 
represent widely contrasted abilities, as measured by success 
in life and by the estimate of society. One of these groups _ 
consisted of seventeen professors and advanced students in 
Columbia University, and the other of twenty men selected 
at the Salvation Army Industrial Home and at a Bowery 
Mission. The tests covered a somewhat wider range than 
those which were used by Burt, and in general emphasize 
somewhat more the higher or more complex mental pro- 
cesses. Thenature of the tests may be gathered from the list, 
which will be presented in a moment. 

Simpson presented his results in the form of a table similar 
to that which Burt reported, giving the intercorrelation 
between each test and every other test. The order of the 
tests, which is based upon the average of the intercorrela- 
tion, bears out the same conclusion which was drawn from 
Burt’s experiment, namely, that the more complex and 
higher mental processes have a higher intercorrelation than 
the simpler ones, which involve chiefly keenness of sensation 
or quickness and accuracy of movement. Furthermore, the 
order of correlation between the tests and estimated intelli- 
gence follows the same principle, and agrees substantially 
with the order as based upon intercorrelation. ‘These facts 
are shown by the two lists given below. 

The nature of most of the tests will be understood without 
further description. The test called “ Crossing a’s” consists 
of marking out as rapidly as possible all of the a’s from a 
printed text. The test in recognizing forms is given by first 
showing the subject a series of irregular geometrical forms, 
and then showing the same forms mixed with a larger num- 


1B. R. Simpson, Correlation of Mental Abilities. New York: Teachers 
College, Columbia University. 1912. 
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Tur Orper or tar Tests Basep Tur Orper or THE TESTS BASED ON 


UPON INTERCORRELATION THE CORRELATION WITH EsTIMATED 
INTELLIGENCE 
Ebbinghaus test Completing words 
Hard opposites Hard opposites 
Memory of words Memory of words 
Easy opposites Ebbinghaus test 
Crossing a’s Easy opposites 
Memory of passages Adding 
Adding Memory of passages 
Recognition of form Learning pairs 
Learning pairs Crossing a’s 
Scroll tests Crossing out geometrical forms 
Completion of words Drawing length 


Drawing length 
Estimating length 
Crossing out geometrical forms 


ber of different ones, and requiring him to indicate the one he 
had seen previously. Learning pairs consisted of associat- 
ing hieroglyphic forms with words. The scroll test consisted 
in drawing with a pencil through an irregular and compli- 
cated lane. Completion of words consisted in finding as 
many words as possible by adding letters to two initial 
letters, such as ca. Crossing geometrical forms was similar 
in character to crossing out a’s. 

It will be noticed that two of the tests, the scroll test and 
estimating lengths, are not included in the correlation with 
intelligence. The correlation of the test of completing words 
was very unreliable. No significance is to be attached, 
therefore, to the fact that it heads the list in correlation with 
imputed intelligence. If we disregard the position of this 
test, it appears that in general those which have high inter- 
correlation also have high correlation with estimated intelli- 
gence. 

In order to determine whether certain general types or 
kinds of mental process have in general a higher intercor- 
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relation than do others, Simpson grouped the tests according 
to the mental categories which they seemed to him to 
measure, and then found the average correlation between 
the tests thus grouped and the other tests. The list of the 
names which he uses to designate the general mental pro- 
cesses and their average intercorrelation is as follows: 


elective thinkine 22 vi rae ses ee cian 5 59 
MemoryAc 2) ash Ler ee .50 
ASSOCIATION teen tae ee ae a ae 48 
Perce ionic. cc eee eee kl aa 
Motor: control. 4a) cei eae eee .26 
Discrimination of length.............. .19 


The following are examples of the correlation which Simpson 
found between tests which, according to his classification, 
fall in the same groups: 


Selective thinking: 


Ebbinghaus test and hard opposites test.............. .98 
Memory: 

Memory of passages and memory of words............ .82 
Association: 

Adding, easy opposites, completion of words, 

learminpeotepalts © t Paseo OOMLOLA OS 

Perception: 

Crossing a’s and geometrical forms................... 5 
Discrimination: 

Drawing lengths and estimating lengths............... 24 


It is evident that some pairs of tests which seem to belong 
in the same class according to the usual method of classifica- 
tion do not measure the same abilities. This forces upon us 
the question whether such a classification furnishes a basis 
for the analysis of abilities, or whether a classification of 
abilities and a description of special abilities must not follow 
some other line. 

The doubt as to the validity of the classification made by 
Simpson is deepened by the fact that in many cases the 
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correlation between pairs of traits which fall in different 
classes is much closer than many of the correlations between 
traits in the same class. These are examples: 


Ebbinghaus test and memory of words.................. 94 
Ebbinghaus test and memory of passages............... 91 
Ebbinghaus test and recognition of form. ............... .88 
Hard opposites and memory of words................... 84 
Hard opposites and memory of passages................- .81 
iHardtoppesites and adding) ).).. ee eeeeeee 79 
Memory of words and learning pairs......<............- Sater: 
Memory of words and recognizing forms................ -71 
Crossing ‘a’s and recognizing forms...............2....- aie 


If it proves to be possible to test special capacities it will 
be necessary to conform to the facts of intercorrelation in 
determining where the dividing lines between them lie. If 
Spearman’s two-factor theory is correct, special capacities do 
not fall into groups, and we should not expect them to be 
correlated with each other except in so far as they partake 


of general capacity. The two-factor theory will be dis- 
cussed in the last chapter. 


CHAPTER IV 
AGE SCALES: THE BINET SCALE 


. Tue studies of correlation between single tests, which have 
been described in the last chapter, did not at once issue in 
the development of practicable scales. They were used 
later in the development of our so-called point scales, and 
this application will be described in connection with the 
account of these scales. The first successful scales of in- 
telligence were those of the type which Binet developed. 


1. Binet’s early experimental work 

As we have already seen, Binet, in the earlier period, 
experimented with tests of the type which had been used by 
other psychologists. These tests measured the simpler 
mental capacities separately. His age scale differs from most 
of his previous tests in that it includes many tests of the 
higher or more complex mental processes, and in that it was 
so arranged that the scores which the pupil makes in the 
various component tests of the scale can be combined into 
a composite score. 

The idea which led to the use of a composite score to 
express the total results of the pupil’s reaction to the tests of 
the scale was the idea of mental age. Binet apparently 
approached this idea in the beginning in a somewhat in- 
direct fashion. The first scale which he put out, in 1905, 
was simply a series of tests of widely different degrees of 
difficulty, arranged in order from the easiest to the hardest. 
Such a series of tests is very appropriately called a scale, 
because it ranges upward in difficulty. 

We may pause to consider briefly some of the character- 
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istics of this 1905 scale.! It consisted of thirty tests, some of 
them being composed of several parts. One of these, in 
fact, included twenty-five individual tests. This multiplic- 
ity of tests is the first feature of significance, and one which 
is largely responsible for the success of our so-called intelli- 
gence tests. It has been found that, while single tests may 
sometimes have fairly good reliability and validity, groups - 
of tests are much superior in these respects. 

As has been said, the tests range from very easy to rela- 
tively difficult. The first test, in fact, is much easier than 
the easiest one in Binet’s later scale. It requires that the 
child should follow with his eye the course of a lighted match 
which is passed in front of his face. The fifth test of the 
series represents a higher stage of development. The ex- 
aminer wraps a piece of candy in a paper in the sight of 
the child and hands it to him to see if he removes the paper 
and eats the candy. In the tenth test the child is shown 
two lines and is required to tell which line is the longer. In 
test fourteen he is asked to give the meaning of the words 
spoon, house, dog, mamma. In test sixteen he is asked to 
tell the difference between well-known objects, such as paper 
and cloth. In test twenty-two he is asked to place in order 
five weights of eighteen, fifteen, twelve. nine, and six grams 
respectively. In test twenty-six he is asked to put in one 
sentence the three words parrot, bank, and fortune. In test 
twenty-seven he is asked to respond to a number of ques- 
tions. This is the test which contains twenty-five items. 
Among them are: ‘‘ What should one do when he is cold when 
there is a fire in the house?” ‘What happens when one is 
lazy and does not wish to work?” “‘ Why is it better to per- 
severe in what one has begun, than to give it up and try 

1A, Binet et T. Simon. “Methodes nouvelles pour le diagnostic du 


niveau intellectuel des anormaux”; in Année psychol., t. 11, pp. 191-244. 
1905. 


AGE SCALES: THE BINET SCALE 83 


something new?” In the thirtieth test the child is required 
to tell the difference between esteem and friendship, and be- 
tween remorse and chagrin. The second important charac- 
teristic of this series of tests, then, is that it is a scale of 
increasing difficulty. This makes it possible to test abilities 
of wide range, and to place the individual on a scale of abil- 
ity which is fairly continuous and uniform from one level of 
experience to another. 

In Binet’s comments on this preliminary ‘scale are con- 
tained the germ of the later age scale. While the tests 
themselves are not classified according to years, Binet in- 
dicates what a child of three, five, seven, nine, and eleven 
years may be expected to do on the tests. Thus he tells us _ 
that a child of three years names and recognizes the names 
of the majority of objects which figure in his everyday life. 
A child of five years, for the most part, repeats three figures, 
compares two lines, and after a lesson, two weights; he is 
likewise able to define the names of familiar, concrete ob- 
jects. 

Finally, this 1905 scale involves the central idea accord- 
ing to which a difference in scores on an age scale is inter- 
preted as being a measure of differences in intelligence. 
Individuals of different degrees of intelligence at the same 
level of maturity are distinguished by the number of tests 
they can pass, just as are children of different ages. An 
idiot, for example, can pass the first six tests, or at least can 
pass no higher than the first six. The capacities of im- 
beciles is represented by the range from tests seven to fifteen, 
and of feeble-minded by the range from sixteen up to the 
point which represents normal intelligence. These levels of 
difficulty in the tests, and the levels of ability which they 
represent, stand for the ultimate mental status of the various 
types of individuals. The idiot will never develop beyond 
the stage represented by test six. Imbeciles will never 
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develop beyond the stage represented by test fifteen, and 
soon. This means, then, that an adult of a certain degree of 
mental ability, represented by mental defect of certain 
amount, reaches ultimately the mental capacity of a normal 
child of a given age. 

We have now reached the fundamental idea of the Binet 
scale and, in fact, the fundamental idea of all of the scales by 
means of which the mental capacity of children is measured. 
This fundamental idea is the identification of differences in 
mental capacity, or differences in brightness, above and 
below that of the average person, with differences in stages 
of mental development as represented by the capacity of 
children of various ages. 


2. The Binet-Simon scale of 1908 


The plan of using the stages of mental growth of a normal 
child as a scale by which to measure differences in intellectual 
capacity, or differences in brightness, is represented first in 
the concept of mental age, and its use as a measuring unit. 
This idea has its complete development in the next scale 
which Binet, in collaboration with Simon, put out in 1908.1 
In the 1908 scale each test is classified under some one age. 
The ages from three years to thirteen years, inclusive, are 
represented. The number of tests at each age varies from 

three at age thirteen, to eight at age seven. Illustrations 
from two of the years will serve to represent the entire scale. 


Five YEARS 


1. Comparison of two weights, one pair three and twelve 
grams respectively, the other pair, six and fifteen grams 
respectively. 


2. Copying a square with pen and ink. 


1A. Binet et T. Simon. ‘*De Développement de l’intelligence chez les 
enfants’; in Année psychol., t. 14, pp. 1-90. 1908. 
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8. Putting together two triangles so as to make the same form 
as a rectangle. 
4. Counting four pennies. 


ELEVEN YEARS 


. Detecting the absurdity of a series of statements. 

Building a sentence out of three words. 

Naming any sixty words in three minutes. 

. A definition of abstract terms. ‘ 

. Arranging a number of words which have been disarranged 
so as to make a meaningful sentence. 


Or He 09 tO 


The method of using the scale, which is essentially the 
same as that of using the later revision, is this. The ex- 
aminer gives the child the tests in order of their difficulty, 
beginning at the point at which the child can probably pass 
all of the tests. He then proceeds upward until he reaches ° 
the point at which the child fails on all of the tests of an age 
group. He next estimates the mental age of the child by 
taking as a basic age the point at which the child passes all 
of the tests. Then he adds to this one year for every five 
tests which the child passes beyond. This gives the child’s 
mental age. 

It is apparent that the mental age thus represents the com- 
posite of a child’s ability on a considerable number of tests. 
It is not required that he pass any particular test. If he 
fails on one, he may make it up by passing another. His 
mental age, then, represents a kind of average score which 
corresponds with what the average child of a particular 
chronological age can do. 

It will be apparent on a moment’s thought that, while the 
mental age may represent the child’s maturity, it does not 
directly represent his intelligence or his brightness. If an 
eight-year-old and a ten-year-old child have the same men- 
tal age, they may occupy the same level of maturity, but 
they do not possess the same degree of brightness. The 
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brightness or intelligence of a child must be found by taking 
account of the relationship between his maturity, or his men- 
tal age, and his chronological age. Binet did not work out 
this problem to a final solution. He simply regarded a child 
as superior if he passed a test a year or two in advance of 
his chronological age, and as retarded if he passed only the 
tests a year or two below his chronological age. 


3. The 1911 revision 


After the appearance of the 1908 scale, a number of psy- 
chologists applied it to children and reported upon the re- 
sults. Among these were Decroly and Degand,' of Belgium, 
who reported that some of the tests were too easy and others 
too difficult. In Germany, Bobertag? gave the tests to a 
considerable number of children and reported very fully upon 
the responses which were made to each test of the scale. He 
reported particularly the percentage of children of each age 
who passed the various tests. He then attempted to deter- 
mine where the tests belong on the scale by placing them at 
that age at which 75 per cent passed. Binet had apparently 
used such a standard, but Bobertag attempted to apply it 
in more exact fashion. In the United States, Goddard 3 
applied the test to the feeble-minded children of the Train- 
ing School at Vineland, New Jersey, and also to a large 
number (about 1500) of normal children in the elementary 
school.t He also criticized the test on the ground that in- 

10. Decroly et J. Degand. ‘‘La mesure de l’intelligence chez des enfants 


normaux d’aprés des tests de Binet et Simon”; in Arch. de psychol., t. 9, 
pp. 81-108. 1909. 


20. Bobertag. ‘A. Binet’s Arbeiten iiber die intellectuelle Entwicklung 
des Schulkindes (1894-1909); in Zeitsch. fiir Angew. Psychol., B. 3, 
pp. 230-59. 1909. 

8H. H. Goddard. ‘Four Hundred Feeble-Minded Children Classified 
by the Binet Method”; in Ped. Sem., vol. 17, pp. 387-97. 1910. 

4H. H. Goddard. ‘Two Thousand Children Measured by the Binet 
Measuring Scale of Intelligence”; in Ped. Sem., vol. 18, pp. 232-59. 1911. 
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dividual tests were not properly placed, and that certain 
parts of the scale were too easy and other parts too difficult. 
However, he reported that the scale as a whole was extremely 
reliable. This conclusion he drew from the fact that the 
distribution of the rankings of children taken as an entire 
group was practically in agreement with the normal distri- 
bution curve. His results, and those of other investigators, 
however, showed that the distribution for some of the in- 
dividual ages was far from normal. For the lower ages the 
tests were too easy, and the majority of the scores were 
above the normal, whereas for the upper ages they were 
too hard and the bulk of the scores were below the normal. 

Taking account of these studies and of the reeommenda- 
tions which were based upon them, Binet revised his scale 
and published the revised form in 1911.1. Due to his un- 
timely death, this was Binet’s final contribution to the test- 
ing movement. 

The changes in the scale were of two sorts. In the first 
place, it was made more uniform by having an equal number 
of tests for each age, namely, five. This avoided the error 
which was present in the calculation of the mental age in the 
1908 scale, due to the fact that at some parts of the scale it 
was easier to obtain advanced credit than at other parts. 
The other type of change was the transposition of some of 
the tests to different ages. In some cases, because the tests 
were apparently too difficult, the tests were moved to higher 
ages. In other cases, because they were too easy, they were 
transposed to lower ages. In order to overcome the excessive 
difficulty of the scale at the upper end, the tests for eleven 
years were moved up to the twelve-year period, and those 
for twelve years to fifteen years. The thirteen-year-old 
tests were called adult tests, and two others were added to 


1A, Binet. ‘‘Nouvelles recherches sur la mesure du niveau intellectuel 
chez les enfants d’école”’; in Année psychol., t. 17, pp. 145-201. 1911. 
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them. This shift, however, did not meet the difficulty, 
since tests were not substituted for the years eleven, thir- 
teen, and fourteen. 

The method of finding the mental age was changed in one 
respect, in that the child’s basic age was taken at the age at 
which he passed all but one test, instead of every test. This 
made the scale somewhat more flexible. 

It is not certain whether the 1911 scale was an improve- 
ment over the 1908 scale. Some investigators who have 
compared the results of the two prefer the original scale. 
The changes which were made were made largely in response 
to criticisms by others, rather than because Binet’s own 
experience indicated their desirability. The fact that the 
ground of the changes was not certain is shown by the in- 
consistency between the changes which were made by Binet, 
and those made by Goddard in his revision, which also 
appeared in 1911. We may pass to a brief account of this 
and other revisions made by other workers. 


4. Other revisions of the Binet scale; Goddard’s 1911 
revision 

One of the earliest and most enthusiastic users of the 
Binet scale in the United States was H. H. Goddard, the 
chief psychologist in the Training School at Vineland.! 
Goddard very early began working with Binet’s scale, 
adapting it to American conditions. In some respects the 
changes which he made were similar to those which were 
made by Binet. For example, above the fourth year he 
provided the same number of tests for each age, namely, 
five. In general he made fewer changes in the position of 
the tests than did Binet, and in most eases retained in their 
original position the tests which had been moved by Binet. 


11. H. Goddard, “A Revision of the Binet Scale’; in Training School 
Bulletin, vol. 8, pp. 56-62. 1911. 
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He retained the tests for eleven and twelve years, but moved 
the thirteen-year tests up. He introduced a number of 
tests of his own into the fifteen-year age group. Goddard, 
of course, adapted the terminology and to some extent the 
content of the tests to the experience of American children. 
This scale was very widely used in the United States until it 
was superseded by the more extensive revision made by 
Terman and his collaborators — the Stanford Revision. 
Kuhlmann began working with the Binet scale rather 
early and has produced two revisions. The first appeared 
in 1912, and the second in 1922.!_ Since the second revision 
is a modification and extension of the first, we may confine 
our discussion to it. The most important contribution 
which was made in Kuhlmann’s revision consists in the ex- 
tension of the scale at both ends, particularly at the lower 
end. His tests begin as low as three months and enable the 
examiner to calculate the mental capacity of very young 
children. At the upper end they extend to fifteen years. 
Further changes consist, first, in the standardization of 
procedure. This was generally found to be necessary before 
one could satisfactorily use Binet’s original scale. A num- 
ber of American authors have contributed to the scale by 
defining the method of procedure in giving it. In the second 
place, the scale was modified in the direction of making it 
more difficult at the lower end, and easier at the upper end. 
This was done by changing the location of tests. Again, 
nineteen of the original tests were eliminated because they 
were found to be unsatisfactory. To those which remain 
were added a large number, so that there were in the final 
scale eight tests for each age group above two years. The 


1, Kuhlmann. “A Revision of the Binet-Simon System for Measuring 
the Intelligence of Children”; in Journal of Psycho-Asthenics, Monog. 
Supplement, vol. 1, no. 1, September, 1912. 

F, Kuhlmann. 4A Handbook of Mental Tests. Baltimore: Warwick & 
York, 1922. 
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total tests in the revised scale are one hundred and twenty- 
nine. ‘This constitutes a very fundamental and thorough- 
going revision, and the scale would doubtless have had wider 
use if it had not been anticipated by equally extensive re- 
vision made by Terman and his collaborators at Stanford 
University. 


5. The Stanford Revision of the Binet scale 


The Stanford revision of the Binet scale is the most widely 
used age scale. It was extensively employed in the schools 
previous to the World War. During the War it was one 
of the two individual tests which were used with the 
English-speaking recruits who passed a low score on the 
group tests. Its purpose was to confirm the standing on the 
group tests and to provide a more accurate measure of in- 
tellectual ability than the group tests afforded. It has con- 
tinued to be widely used in the schools up to the present. 

Preliminary experimentation leading up to the Stanford 
revision was begun by Terman in collaboration with Childs 
in 1911 or 1912. The results of the preliminary experimen- 
tation were published in 1913.1. The purpose of this pre- 
liminary investigation was to secure the scores from chil- 
dren of different ages on a large number of new tests. 
These tests were to be used to supplement the tests of the 
original scale. Some of these new tests were incorporated 
into the Stanford Revision. We shall call attention to these 
new tests in the more detailed description of this revision. 


6. Description of the Stanford revised scale 
The new scale contained ninety items. Fifty-four of 
these are from the Binet scale, and thirty-six are new. 
Twenty-seven of these new tests were devised and standard- 
1L, M. Terman and H. G. Childs. ‘A Tentative Revision and Exten- 


sion of the Binet-Simon Measuring Scale of Intelligence”; in Journal of 
Educational Psychology, vol. 3, pp. 61-74, 133-43, 198-208, 277-89. 1913. 
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ized in the previous investigation made by Terman. The 
other new tests were borrowed from other investigators or 
adapted from earlier Binet studies. 

In the tests which are selected from the Binet scale many 
changes were made. Eighteen were shifted downward one 
year, four downward two years, two three years, and one six 
years. ‘Three were shifted upward one year and one'two 
years. The location of each of the tests were determined by 
the results of their application to about one thousand children 
of a community near Stanford. 

The completed scale contains tests for each age from three 
to ten, and in addition tests for ages twelve and fourteen, 
and for average adult and superior adult intelligence. There 
are six tests for each age except year twelve, for which eight 
tests are provided. ‘The credit for each test is so arranged 
that one year in mental age may he gained by passing the 
tests for each year. Thus, up to year ten, in which there are 
six tests for each age, the credit in mental age for passing 
each test istwo months. For year twelve, in which there are 
eight tests, the credit for each is three months, which gives 
a total credit of twenty-four months. This stands for both 
year eleven and year twelve. For year fourteen, there are 
six tests of four months’ credit each, or a total credit of 
twenty-four months, and soon. For the average adult there 
are six tests with five months’ credit for each, or a total 
credit of thirty months, and for superior adults six tests with 
six months each, giving a total credit of thirty-six months. 
The reason for the extra credit in the case of the average 
adult and the superior adult groups is that, as one approaches 
the end of the scale, the possibility of making credit by 
passing advanced tests is progressively reduced. 

As an illustration of the tests of the scale we may take 
those for age twelve. A copy of the record booklet contain- 
ing the tests for this age is reproduced on page 92. 


UALSVIN ATHAVAA SIH GOOD NOC SANadAC V 
AW I LO@AAOD UAHOVAL AW AAdVd GAaAASV OL 
aunoH 


LV ATAVA AYLNNOO AM NV GCHLAVLIS AHL AOA 


YEAR XII. (8 tests, 3 months each, or 6 tests, 4 months each.) 


*1 2 Vocabulary, 40 words. (Scores... cm cee eeeiceste ‘Retal'Voeabs....2.!saceeee 
2. Abstract words. (3 of 5.) 
Oz, PHY ccc adele aivwis ances Gee Bare See Dees eee ee eee ee See ee 


b. Revenge 
c. Charity 
d. Envy 
Gif BUSEIC! ae a isse sisi 'a als oseisc alain, a: de hte ale Oae et ade Cie tate ecene tetfetcticee mentareis eters chee 
3. Balliand field. (Superior plan.)\}o.70 acme sole cee ele eee eee ei eee 


*4, Dissected sentences. (2 of 3. 1 minute each.) 


airs fais one) ¢ & brel ala aie Ws ¥evla a ahaie eS eee IRLRERTE Patent ee Fore DMG <2 ce -shcnceneee 
Ds sale ecetola G acerpiale) Sizis. © kre eisitelcnoe ele a eR eee eee eet SBIME s «5 oc se care 
Ox "Ds Vasayetaietie b ce hve iste Siete legslvevare tte anes eRe ere ites HNIBG. 2 ss Seeeceise 


a. Hercules and wagoner 
b. Maid and eggs 


c. Fox and crow 


d. Farmer and stork 


¢.. Miller, son and) donkey «as swdisriures.0's suanetaebetmenetembetersieteiclielelebare ei sien nae tele 

*6, Repeats 5 digits backwards. (1 of 3. Read 1 per second.) 
Se 1-89 7-9 cmsvantarsalan es 6-0-4-8-2 os. sinners B= 2-O-G Doc ceive wre wre ° 

*7, Pictures, interpretation. (3 of 4. “Explain this picture.”) 

ay Dutch Homey os. cavsces.s os ciosivntls Celsineiee BRR ete oe Oe Nees Siete Ceres naan 

DB. Canoe ivis vie ws cvntare ee ols le iortyeye Ree wtnae SORE TereestsTotels Creo reeeats 

C. Post fi Ces. 0 .s.ccscatets sein v's vole auery ce eins 8 ote peeteneepeieta eicier te feenish thet ciet sia ee Rae 

di. Colonial: Home 5 ic.c4.a) ch oe eee le biota ete a MT ROR ETS 1 moe ter sl er eral etre chct neat 
*8, Gives similarities, three things. (3 of 5. “In what way are —, —, —, alike?’’) 

G. Snake, cow, Sparrow... «x si ss/e sigeiois:ocatslare ie ere emMOMERRRE LTCC ere Tere sieieretehs : 

b. Book, teacher, newspaper.........000- [is Sten unger me eke iousvesbveints cities steltens 

a. Wool, cotton, ‘leather, ./.:.:0 sex wsiva.o cinta ieersinie ein Stene eee eter eresrerere evererer 

d, Knife-blade, penny, piece of wire........... we Rook ererchte a ree usretnie eos Bie weld 


Gr AROSE}, DOCALO, CLES ine wns cierere ieee + eC oea ae bBo N ale eie wD a Deine NG sive see NlE 


AGE SCALES: THE BINET SCALE 93 


The vocabulary test was added in the Stanford Reyision. 
It consists of one hundred words selected at random from 
the dictionary, and arranged from easy to difficult or from 
familiar to unusual words. These hundred words are con- 
sidered a sufficiently large sample of the pupil’s vocabulary 
to enable one to calculate roughly his total vocabulary. 
Experiments indicate that the scores made with different 
groups of one hundred words similarly chosen at random are 
only slightly different. The vocabulary test, in the opinion 
of the author, has a far higher value than any other single 
test of the scale. It appears at several ages, and the child’s 
mental age in the test is determined by the number of words 
for which he can give the meaning. 

The instructions for giving the vocabulary test occupy 
about a page and a half of the text. These instructions in- 
clude the formula which is to be employed in presenting the 
test, the statement of the procedure to follow if the child 
hesitates or fails, the distance along the list which the experi- 
menters should proceed or the point at which he should stop, 
cautions against helping the child, suggestions that he be 
encouraged and directions for recording the results. In 
addition to this the method of scoring the test is given in 
detail. This occupies two and two-thirds pages, and con- 
tains a considerable number of illustrations. The difficulty 
in scoring the tests is to determine whether or not the defini- 
tion which the child gives comes within the scope of the re- 
quirements. The child is not required to give the formal 
definition, but only to indicate by some means whether he 
knows the true meaning and use of the words. For example, 
in the case of the word pork, if the child responds meat, and 
cannot tell what animal it comes from, he is given half credit. 
He is given full credit if he knows that pork comes from a 
pig. 

These minute directions for giving and scoring the test 
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are typical of all of the tests of the scale. The manual which 
contains these directions occupies two hundred and twenty- 
nine pages of Terman’s text, The Measurement of Intelli- 
gence. It is therefore obvious that a considerable amount 
of careful study and preliminary practice is necessary before 
an individual is prepared to give this test in such a way as to 
obtain valid results. 

The manner in which the result of the examination is re- 
corded is indicated by the record form supplied on the out- 
side page of the Record Booklet. This form also provides 
space to record additional information which is useful to 
supplement and interpret the mental test score. 


7. The derivation of the Stanford Revision 

The construction and development of this revision is 
described in a monograph published by Terman in 1917.! 

The first step in the standardization of the scale was to 
take the tests in the original Binet scale and the additional 
tests which had been tried out in a preliminary experiment, 
or which had been gathered from other sources, and make 
them into a trial scale. The basis for the selection of the 
tests for the trial scale was the percentage of children of 
various ages who passed the tests, as reported by the in- 
vestigators who used them. This trial scale was then given 
to about one thousand children up to the age of fourteen, and 
to about four hundred adults. 

In regard to the selection of the children upon whom the 
test was standardized, the authors make the following state- 
ment, “A plan was then devised for securing subjects who 
should be as nearly as possible representative of the several 
ages. ‘The method was to select a school in a community of 


11, M. Terman and others. The Stanford Revision and Extension of the 


Binet-Simon Scale for Measuring Intelligence. Baltimore: Warwick & 
York, 1918. 


RECORD BOOKLET 


For The Stanford Revision of the Binet-Simon Tests as described in Terman’s 
The Measurement of Intelligence 


Copyright, 1916, by Houghton Mifflin Company. All rights reserved. 


INO ws. & cue Serlessh hse DRte A aie certian EKA IOMEE fee 1 os ches rie area 
LENT es Ee een eee Ameena aa ate e IMS AGES. wteransme ee IQ 
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DARED ba secre fale 6 cr-f5s erasksnel oe OER, I eee Le eh eee 
EROS SY Oe ROS SESE Bee os eer ea ge MO he eee bere BL 
NOTES ON EXAMINATION SumMMARY 
: YRS MOS 

Time begun......... 5 finished © 2 occs.s3- s EME TAQ «5-7 se Si estcione, pox erie 
Wat Lats rds) neers 

Dive ncaa eave ieee ate 

Goo e cement (aaracteune 

DR eae eh) seers 
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MS deta a sad tans 
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SPECIAL INFORMATION 

Standing height........ Sitting height........ Weight.inchs.02 Headlem saci 
WRIGHT OPID ec weofays cle 6 sie ans Left gripjot. cancer oe Bing rcapacitiys stan) oe eee 
SS EYSEPIEEE Esa rea Meeteta ee. c ees oie ole pa eacor aS eae auc sac MTOR C Vay aint pasta) Pe eeye sega nae pueeevayet aistois i tevatanaeaes 
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Social status: Very inferior, inferior, average, superior, very superior. 

Years attended school............, Grades repeated............ SkippedinG.rnncos femare 
School success: Very inferior, inferior, average, superior, very superior. 

Teacher’s est. of I: Very inferior, inferior, average, superior, very superior. 

Jy RRS CRTC ate ae eho pee ed eon art eee rR Er BIG Cha MHECIO OD GE ACCU OC ORC SOOT Or 
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average social status, a school attended by all or practically 
all of the children in the district where it was located. In 
order to get clear pictures of age differences, the tests were 
confined to children who were within two months of a 
birthday. To avoid accidental selection, all the children 
within two months of a birthday were tested, in whatever 
grade enrolled (below the high school). Tests of foreign- 
born children, however, were eliminated in the treatment of 
results” (p. 11). 

Since the directions for giving the tests were to be those 
in the final scale, they were worked out with the care and 
minuteness which has already been described. The tests 
were given by a number of students, all of whom had been 
carefully trained by Terman. The children’s responses 
were, for the most part, recorded verbatim, in order that the 
records might be rescored if it was necessary to change the 
difficulty of the scale. 

The next step was to score all of the children according to 
the method which was provisionally fixed upon, and to find 
their [.Q.s. The 1.Q.s of each age were then thrown into a 
distribution table. The requirement which was employed 
in the examination of this table to determine whether or not 
the scale was correct was that the distribution of I.Q.s of a 
particular age should be normal, and that the median middle 
age of the children should correspond to their chronological 
age. That is, the median child of the ten-year-old group 
should have a mental age of ten years, and the mental ages 
of children above and below this mental age should be dis- 
tributed in such fashion as to form a symmetrical curve of 
distribution. 

As might be expected, the first trial did not come up to 
this requirement. Adjustments were then made by either 
changing the location of tests or by changing the standard 
of scoring, and the children’s records were rescored. On the 
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second scoring and tabulation the scale still failed to come up 
to the standard. It was only after the third revision and 
rescoring that it came sufficiently near the standard to 
satisfy its author. 

The care which was taken in the preparation and stand- 
ardization of the scale has justified itself in all its subsequent 
use. The intellectual standards which are represented in 
the scale have been found to be substantially accurate in the 
large number of cases in which the test has been given to 
English-speaking American children. 


8. Measures which are derived from the scale 


In the description of the original Binet scale it was re- 
marked that the score which the child made was expressed 
in terms of mental age. It was further said that the signifi- 
cance of the mental age was to be gathered from its com- 
parison with the child’s chronological age, but that this 
significance was expressed by Binet only in rough fashion 
when he said that if the child were one or two years below his 
chronological age mentally, he was to be regarded as re- 
tarded. 

It was soon discovered by the users of any form of the 
Binet scale that the significance of one year’s retardation or 
acceleration was different at the lower ages and at the higher 
ages. One year’s retardation was found to be more serious 
at the lower ages. To put it in another way, approximately 
twice as many children would be one year retarded at twelve 
years of age as at six years of age. Or, to put it in still an- 
other way, the same number of children were found to be 
retarded one year at six years of age as are retarded two 
years at twelve years of age. This means that a given 
amount of retardation, as expressed in years of mental age, 
is a variable quantity and depends upon the age of the child. 

The possible explanations of this fact will be considered in 
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the chapter on the technique of mental tests. What con- 
cerns us at this time is the fact and its practical meaning in 
mental measurement. ‘The first man to suggest a measure 
which should avoid the difficulty which has been mentioned, 
and which would have the same significance for children of 
different ages, was William Stern. Stern called his measure 
the mental quotient.!_ The mental quotient was to be found 
by dividing the child’s mental age by his chronological age. 
Thus a child whose mental age was equal to his chronological 
age would have a mental quotient of 1. A twelve-year-old 
child whose mental age was ten would have a mental quo- 
tient of .833, or ten twelfths, while a child whose chrono- 
logical age was ten and mental age was twelve would have 
a mental quotient of 1.20. 

Terman’s statistics convinced him that this type of 
measure was substantially correct. He called it, however, 
the intelligence quotient or I.Q., and expressed the quo- 
tients as whole numbers by multiplying them by 100. The 
test of the correctness of the measure is the comparison of 
the range of intelligence quotients for successive life ages. 
Terman found that by beginning with the lower ages and 
comparing successive two-year groups, the middle half of 
the intelligence quotients for each group covered substan- 
tially the same range. The lowest range of the middle fifty 
per cent of the intelligence quotients was fifteen points, and 
the highest range seventeen points. Thus the intelligence 
quotients of the children of five and six years of age covered 
a range from 97 to 111, while those of eleven and twelve 
years combined had a range from 92 to 108 (p. 40). 

The intelligence quotient, expressed in words, then, means 
the relation between the child’s mental development and 
what we should expect of him at his age. If a child main- 


1 William Stern. The Psychological Methods of Testing Intelligence, p. 80. 
Baltimore: Warwick & York, 1904. 
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tains the same relative intellectual capacity from year to 
year, he should have the same intelligence quotient. This 
would. not be true, as we have found, of the difference be- 
tween his chronological and mental age. This difference 
would increase as he grows older. We have thus in the 
intelligence quotient a measure which is constant, and by 
means of which we can compare children of different ages. 
Assuming that differences in intellectual maturity con- 
stitute a means of measuring differences in intellectual 
capacity or brightness, the intelligence quotient, or the 
1.Q., becomes a measure of brightness. 

The significance of the various intelligence quotients can 
best be grasped by an examination of the following table, 
which indicates the percentage of persons who are awarded 
the various intelligence quotients. 


Taste VII. Tue Distripution oF INTELLIGENCE QUOTIENTS 
The percentage of individuals who have various intelligence 
quotients or lower: 
1 OB Ee tahcackone OMS (Ot (OmEOo mS Smo Ueno 2 mn OD 
IPencentarme le 2 1 O wo me LOM ome 20m Come OG. 
The percentage of individuals who have various intelligence 
quotients or higher: 
Qe een e 106 108° 110 113 116 122 125 128 130 
IRercentppnnOo nomen Om 20m malo 10 5 3 Q 1 


This table is to be read as follows: The lowest one per cent 
of persons in general have an I.Q. of 70 or below. The low- 
est two per cent have an I.Q. of 73 or below. The lowest 
twenty-five per cent have an I.Q. of 92 or below. The 
highest one per cent have an I.Q. of 130 or above. The high- 
est twenty per cent have an 1.Q. of 110 or above, and the 
highest thirty-three and one third per cent have an I.Q. of 
106 orabove. Wesee that according to this table the middle 
fifty per cent of all individuals have intelligence quotients 
ranging from 92 to 108. 
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Another way of indicating the significance of the intel- 
ligence quotient is to use descriptive terms, thus: 


Cuass Rance or 1.Q.s 

Near genius® 0.5 5/0 facts ese eee 140+ 

Very stiperior ¢.... 0... Ska enol eens 120-140 
Superior saduss wack ese ee 110-120 
Normal. 27:08).s8 cece ee eee 90-110 
alls fete ogttocs se taco eieeeen a ee 80— 90 
iBorder-lin@ias 440-0 ke ae 70-— 80 
IMPOrON 2 Oo oie: ane shee Sega 50- 70 
Imbeciles.7.) sie eee ae eee 25— 50 
Idiots 6221s «shone eee O- 25 


The lower three groups, including the intelligence quo- 
tients from 0 to 70, are designated as feeble-minded. Thusa 
ten-year-old child whose mental age was seven years or less 
would be classed as feeble-minded, according to this scale. 


9. The use of the Stanford Revision 


The following materials are needed for the application of 
the Stanford Revision of the Binet scale: 


1. The Measurement of Intelligence, by L. M. Terman, pub- 
lished by Houghton Mifflin Company, 1916. This is the Manual 
which is the source of information concerning the application and 
the scoring of the tests. It is absolutely essential to the giving of 
the tests, and must be studied with great care. 

2. Test Material for the Measurement of Intelligence, by L. M. 
Terman, published by Houghton Mifflin Company. This con- 
tains figures and pictures for the presentation of some of the tests 
and for scoring others. 

3. Record Booklets, published by Houghton Mifflin Company, 
and furnished in lots of twenty-five. These booklets furnish 
spaces to record the child’s answers and to record his score. 


After having studied the Manual carefully, the examiner 
may, in the actual conduct of the test, use the Condensed 
Guide for the Stanford Revision of the Binet-Simon Test, by 
L. M. Terman, Houghton Mifflin Company, 1920. 
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The examiner who is very familiar with the scoring of 
the tests also, may use, in place of the larger Record Booklets, 
a condensed record sheet, consisting of one folded sheet. 
This is also published by Houghton Mifflin Company, and 
furnished in lots of twenty-five. It is advisable, though, at 
least in most cases, to use the larger Record Booklets. 

In addition to these materials, the examiner must provide 
himself, in order to give the scale completely, with the follow- 
ing: 


1. A set of weights. These weights may be obtained from C. 
H. Stoelting Company, 3034 Carroll Avenue, Chicago. 

2. Two or three pieces of string. Shoestrings are satisfactory. 

3. A number of sheets of writing-paper, and some sheets of 
thin paper. 

4. A watch with a second-hand. 

5. A number of coins — thirteen pennies, a nickel, a dime, and 
a quarter. 

6. A pencil. 

7. A pen and ink. 

8. Two cards, each about two by three inches, one cut along 
the diagonal. 8 

9. A key. 

10. A small pasteboard box. 

11. Three one-cent postage stamps. 

12. Three two-cent postage stamps. 


The list of materials indicates that the administration of 
the test is a rather elaborate matter and one which requires 
careful preparation. The following preparation is the 
minimum if one is to obtain a reliable rating. The examiner 
should first make a minute study of the author’s Manual. 
He should then give a few preliminary tests. The responses 
in the case of these tests should be written out as fully as 
possible. He should then refer to the Manual for the scor- 
ing of the tests. The children tested in this preliminary 
fashion should be of various ages, so that one will acquire 
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experience throughout the entire scale. After thus giving 
a few tests and scoring them, the examiner should study the 
Manual again. If he can give some tests under the super- 
vision of a trained examiner, this will be to his advantage. 
He may next give a few more practice tests, and score them 
by careful reference to the Manual. He is now possibly 
ready to apply the scale and to use the results as means of 
estimating a child’s intelligence. He should continue, how- 
ever, to refer frequently to the Manual and to record the 
responses of the children fully. Only after having given a 
considerable number of tests should he trust himself to give 
the tests without consulting the guide, or to score them with- 
out reference to the Manual. Even the experienced exam- 
iner will have to refer occasionally to the Manual to settle 
doubtful points in scoring. 

If the scale is as difficult to give as is implied in the pre- 
ceding description, one may raise the question whether it is" 
worth the elaborate preparation and the time required to 
give it. As is well known, we have available now many 
group tests which can be given to large numbers of children 
in the same time that is required to test one child by the 
Stanford Revision. The Stanford Revision was, of course, 
prepared and put on the market before the group tests had 
been thought of. It still retains certain advantages, how- 
ever, in comparison with the group test. 

An individual test has an advantage over any group test 
in that the examiner has an opportunity to observe the child, 
to notice any peculiarities in his action, and to adapt himself 
to the child’s peculiarities so as to be sure that he under- 
stands every feature of the test, and that he does his best in 
it. Here, then, are two advantages. The observer can 
notice the peculiarities in the child’s behavior and he can 
secure the child’s maximum response. 

A somewhat similar advantage to the last one is that the 


’ 
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child, in taking the individual test, has an opportunity to 
make an unhurried response, unlimited by the requirements 
of rapid performance. There are, it is true, some time 
limits in the various tests, but in most cases they enable the 
child to give a satisfactory answer if he is capable of doing so. 
In any case, speed is a comparatively unimportant feature 
of the scale, whereas in the group scale speed is at least as 
important as the child’s ability to perform a difficult task. 
It may be that the measurement of the speed of perfor- 
mance, or the measurement of the child’s performance under 
conditions in which rapidity is important, is also desirable. 
We shall see later that it would probably be worth while 
to have pure speed tests, and other tests which may be 
designated power tests, in order that we may analyze the 
child’s ability. For the present, the individual tests are the 
best measure of the limit of the individual’s capacity when 
they are not complicated by the necessity of rapid response. 

Whether a simpler individual examination exists, or can be 
devised, which possesses the advantages of the Stanford 
Revision we are not yet able to judge. The Herring Re- 
vision, described in Chapter VI, is simpler than the Stanford 
Revision in that it requires less preparation on the part of the 
examiner and less elaborate materials. Whether its stand- 
ardization is as adequate and its reliability as great, we have 
not yet the evidence which enables us to pass an opinion. 
In any case, the Stanford Revision has performed a very 
great service in the field of testing, and has stood for years 
as the outstanding example of carefully and scientifically 
standardized mental tests. 


DESCRIPTIVE ACCOUNTS IN ENGLISH OF THE BINET 
SCALE AND ITS IMPORTANT REVISIONS 
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CHAPTER V 
TESTS FOR THE ANALYSIS OF MENTAL CAPACITY 


WE have seen how mental tests developed from single tests 
into age scales. In the beginning, the purpose which was 
back of the development of the single tests was the measure- 
ment of specific mental capacities. The evidence is that the 
early psychologists did not have in mind, primarily, the 
measurement of general mental capacity. The development 
of tests which would measure general capacity, or general 
intelligence, as we have seen, arose in two ways. On the 
one hand, the studies of correlation, such as those which were 
made by Spearman and his successors, brought to light the 
fact that the mental processes are interrelated, and prompted 
the search for mental tests which are closely interrelated, and 
which may be supposed, therefore, to measure some general 
or central capacity. We shall see later how this study of 
correlation was largely influential in the development of 
our present-day point scales. On the other hand, the age 
scale of Binet emphasized general mental capacity through 
the measurement of a composite of mental traits. While 
Spearman used as his criterion of the significance of a test 
for the measurement of intelligence its correlation with. 
other tests, Binet used the criterion of age progress. He 
worked on the assumption that if a mental test gives scores 
which advance rapidly with age, it represents general in- 
telligence. Mental maturity, that is, was identified by 
Binet with brightness. We see then, that the correlation 
movement and the age-scale movement both directed at- 
tention toward general intelligence rather than towards 
particular mental functions. 
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1. Test groups 


We have now to consider another type of study of mental 
tests which represented to some degree the earlier interest 
in the measurement of special mental capacities as well as 
the measurement of general capacity. About the time of 
the appearance of Binet’s final revision, a number of psy- 
chologists were bringing together groups of mental tests 
which were selected for the purpose of gaining an all-round 
inventory of the individual’s mental capacity. The individ- 
ual tests were chosen, not primarily because they correlated 
with one another, or because they showed marked progress 
with age, but because they were thought to measure certain 
specific mental traits which it was important to measure. 
In some cases a composite score in all of the tests of the 
group was found, but in all cases some attention was paid to 
the scores of the individual tests, as well as to the composite 
score. We may call these collective tests test groups. 

The test groups, as we have seen, have something of the 
characteristics of the tests of specialized ability, and some- 
thing of the characteristics of the composite scales for the 
measurement of intelligence. They tend to develop toward 
one or the other of these two extremes. If they develop 
toward the greater specialized measurement of individual 
mental capacities, they become profile scales. A profile 
scale is one which keeps distinct the measures of individual 
traits, and at the same time exhibits them in relationship to 
one another in an organized pattern. These we shall de- 
scribe in the latter part of the chapter. The composite 
scale, on the other hand, disregards the scores in the indi- 
vidual tests and uses merely the composite score of the en- 
tire group of tests. 

The organization of a group of tests which shall analyze 
mental capacity as a whole into constituent elements pre- 
supposes that such an analysis is possible. It presupposes 
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the existence of clearly distinguishable and measurable 
capacities. The efforts to classify mental capacities and to 
find tests to measure them have proved peculiarly baffling 
because the investigator has the double problem of finding 
whether a particular hypothetical capacity is really a distinct 
capacity, and at the same time of finding a means of 
measuring it. As a result, our facts and theories in refer- 
ence to the testing of special capacities are ina confused 
state. For example, there is vigorous debate as to whether 
_ intelligence is a single capacity, along with other special- 
ized capacities, or whether intelligence itself is made up 
of elements. (See chapter on “The Nature of Intelli- 
gence.) This being the case, we shall survey the efforts 
which have been made to analyze mental capacity by 
means of tests, keeping in mind the variation in the pre- 
suppositions regarding the lines which aes an analysis 
should follow. 


2. The Healy-Fernald test group 


We may first describe in some detail a representative 
example of a test group, and then mention the other chief 
groups more briefly. The group which is to be mentioned 
in detail is that which was devised by Healy and Fernald. 
This test group, like the Binet scale, arose from a practical 
need. The principal author, Dr. Healy, was the psycholo- 
gist of the Psychopathic Institute, which was unofficially 
connected with the Juvenile Court of Cook County, Illinois. 
His duty, in this position, was to examine the children who 
were brought to the court, and to endeavor to discover the 
cause or causes of their delinquency and the mode of treat- 
ment which would be most likely to remedy it. It was 
necessary, in order to make a thorough diagnosis, to conduct 


1 William Healy and Grace M. Fernald. Tests for Practical Mental 
Classification. Psychol. Monog., vol. 13, no. 2. 1911. 
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a mental examination, and it was for this purpose that the 
series of mental tests was collected and organized. 

At the time the Healy-Fernald tests were being developed, 
the 1908 Binet scale was in use. The authors were not com- 
pletely satisfied with it, however, on two grounds. In the 
first place, they wished to determine not only the child’s 
general intellectual level, but also the capacities in which he 
was strong or weak. In the second place, they wished to 
discover, if possible, what prominent traits might be utilized 
in devising curative treatment. The reaction of the child 
to the various tests, therefore, was kept distinct, and his 
general mental level was determined, not by calculation of 
a composite score, but by a general summary of his response 
to the various single tests. 

Another characteristic of this group of tests is that the 
procedure of giving, and particularly of scoring the tests, 
was not worked out in as great detail as is common in our 
present-day scales, or as was the case with the Stanford 
Revision of the Binet scale. These tests, in this respect, 
resemble the earlier Binet scale of 1905. The authors did 
not emphasize the objective score which the child made so 
much as his general behavior and the way he went about the 
tasks which were set him. In this respect the tests are like 
those which have been used for many years by the psychia- 
trist. The Stanford Revision, as we have seen, presents on 
the contrary a very elaborate and very detailed set of direc- 
tions for the giving and scoring of each test. The effort is to 
make the administration of the scale as objective as possible, 
and to rely principally upon the score which issues from it. 
Our present scales have gone still further in the direction of 
making the presentation and the scoring of the test objective 
or independent of the judgment of the examiner. 

We may illustrate the characteristics of the Healy-Fer- 
nald group by a list of the individual tests. 
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List or THE Heaty-FerRNatp Tests 
1. Picture form board. This consists of a simple picture pasted on 
thin board, having certain parts cut out by a scroll saw. 
These parts are to be fitted in the proper places by the 
children. 
. Picture puzzle. Similar to the picture form board, but with a 
larger number of pieces cut out. 
3. Construction puzzle A. This puzzle consisted of a frame and 
five rectangular pieces. These rectangular pieces can be fitted 
together into the frame. 
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SS 2. ILLUSTRATION OF CONSTRUCTION PuzzLE B or THE HEALY- 
FERNALD SERIES 


(Reproduced by permission of C. H. Stoelting Co.) 
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4. Construction puzzle B. This is similar to puzzle A, but has 
six spaces to be filled instead of one, and the pieces are of 
various shapes. (See Fig. 2.) 

5. Puzzle box. (Fig. 3, p. 110.) This is a box about eight inches 
square with a glass top. The top is hinged and fastened with 
a bolt-hook on the front of the box in plain view of the child. 
This bolt-hook is kept in place by a string which passes to the 
inside of the box and is hooked overa post. This again is kept 
tight by another string, which is fastened in another place, and 
so on. There are five or six steps altogether and the child 
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may discover how to open the box by tracing the fastenings 
step by step, and then beginning at the end of the series. 


PW 


Fig. 3. InLusTRATION OF THE PuzzLE Box or THE 
HeEALY-FERNALD SERIES 


(Reproduced by permission of C. H. Stoelting Co.) 


6. “‘Aussage”’ or testimony test. This test consists in showing the 
child the picture of a butcher shop and making record of the 
number of things which he is able to report upon after the 
picture has been removed. The child’s suggestibility is alse 
examined by asking him questions about things which were 
not in the picture and seeing whether he yields to the sugges- 
tion. 

7. Drawing. The child is shown the two Binet figures, each for 

five seconds. The child is asked to draw the figures from 

memory. 

A simple learning test, by the substitution method. Nine simple 

figures are to be substituted for the nine digits. At the top of 

the sheet is a key containing the simple figures and the digits, 
each figure to correspond to one digit. Below are series of 


s 


10. 


11. 


13. 


14, 
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rows of the figures to which the child must attach the appro- 
priate digits. 


. Cross-line test A. This consists of a simple figure of two cross- 


lines, in which are inserted the first four numbers. The child 
is to learn to use the adjacent part of a figure to represent each 
of the digits. 

Cross-line B. The test is similar to cross-line A, except there 
are two lines in each direction at right angles to each other, 
making nine spaces, and there are nine digits instead of 
four. 

Code-test. 'This is a complication of the two cross-line tests 
in which two of the more complicated cross-lines and two of 
the simpler cross-lines are used together to represent all of the 
letters of the alphabet. The child learns the code by learning 
which figures represent the various letters, and then writes a 
message in the code from memory. 


. Visual-verbal memory. Tested by requiring the child to re- 


produce what he can of a passage which he reads. 
Auditory-verbal memory. A similar test in which the child 
reproduces what is spoken to him. 

Instruction box. Box in which a small door is fastened by a 
mechanism which is concealed inside of the box and which can 
be opened by moving in particular ways certain levers project- 
ing out of the box. The examiner gives the child instructions 
all at once, and notes the faithfulness and accuracy with 
which he follows them. 


. Opposites test. A series of words which have easy opposites 


are given to the child and he is required to give the opposite 
words. 


. Motor codrdination tests. The child places a dot quickly in as 


many half-inch squares as he can in thirty seconds. 


. Handwriting. The child writes a simple passage in order that 


the quality of his writing and the mode of his codrdination 
may be observed. 


. Arithmetic. The child is given a few simple arithmetic 


problems. 


. Reading. ‘The child is given some simple reading passages. 
. Checkers. The child plays a game of checkers with the exam- 


iner in order that he may observe how foresighted the child is 
in his reaction. The test can only be given to children who 
know the game. 
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21. Reaction to moral questions. The test gives rather incidental 
information concerning the child’s moral attitude. 

22. Information test. Such questions as, “ Who is the President?” 
“What does Fourth of July celebrate?” ‘‘ What is the largest 
city in America?” etc., are given to measure the child’s gen- 
eral store of information. 

23. There was later added a pictorial completion test which is 
more elaborate than the picture puzzles in the first two tests. 


It will readily be seen that a group of tests of this sort is 
different in character and in purpose from such a scale as 
the Stanford Revision. Its aim is not to establish a definite, 
quantitative measure of the child’s intellectual capacity. 
It is rather to make a more qualitative analysis of his in- 
tellectual capacity and of the mode of his reaction -to his 
environment. ‘This analysis is to be made, not simply for 
the purpose of measuring his various intellectual traits, but 
also in order that incidental information may be gained con- 
cerning weaknesses which may have made it easy to fall 
into delinquency, and strong traits which may be used in 
promoting his recovery. The tests are not elaborately 
standardized, and the scores are not to be used in any pre- 
cise fashion. For example, there are no age norms pre- 
sented. 

Age norms on these tests were worked out by a later 
associate of Healy’s, Clara Schmitt.! The tests were given 
to children of different ages and the scores made by them 
were recorded. The scale is not well adapted to a rigid 
standardization of this sort, however, and it has never been 
used in the same fashion as our age scales or point scales. 

An attempt to work out a definite method of using various 
tests of this series, as means of diagnosing the special abilities 
or disabilities which underlie success or failure in the various 


1 Clara Schmitt. Standardization of Tests for Defective Children. Psychol. 
Monog., vol. 19, no. 2. 1915. 
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school subjects, has been made by Bronner.! The attempt, 
however, is not convincing because it is based on the de- 
scription of only a few cases. It requires the survey of a 
large group of cases to demonstrate that defective mastery 
of reading or arithmetic, for example, is due to deficiency in 
the particular ability measured by a mental test. It would 
be necessary to show that a low score in the mental test was 
uniformly accompanied by deficiency in the subject, and 
that a high score in the test was uniformly accompanied by 
success in the subject. This has never been shown, so far 
as the writer is aware, for any specialized test or any school 
subject. 

There has been some dispute concerning the relative value 
of the tests which are standardized by elaborate statistical 
methods, and the qualitative tests such as these of the 
Healy-Fernald series or the tests which the psychiatrist uses. 
J. V. Haberman,’ for example, criticizes sharply the applica- 
tion of rigid statistical methods to the development of 
mental tests. Haberman insists that standardization is 
useless in the case of the untrained examiner, and not neces- 
sary in the case of the trained examiner. The problem is 
not to be solved in these terms. Even the trained examiner 
needs a rigidly standardized test for purposes of making an 
exact quantitative measure of the child’s capacity. This 
type of measure can also be secured with a fair degree of 
accuracy, even by the untrained examiner. For the quali- 
tative analysis of the nature of those defects, or of the 
psycho-physiological conditions for which standardized 
tests have not yet been devised, the trained examiner is 
necessary. Thoroughly standardized tests which give 


1 Augusta F. Bronner. The Psychology of Special Abilities and Disabilities. 


Little, Brown & Co., 1917. bane 
2 J. V. Haberman, “The Intelligence Examination and Evaluation”; in 


Psychol. Review, vol. 23, pp. 352-79, 383-500. 1916. 
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objective quantitative measures of few, if any, special in- 
tellectual capacities or of the non-intellectual traits have 
yet been worked out. Several tests of these hitherto un- 
measured traits have recently been devised, however, and 
no one can say what the limit of their development will be. 


3. Other test groups 


One of the early test groups, and one which was applied 
extensively to children prior to our present-day highly 
standardized tests, is a group which was brought together 
by Pyle.1 A distinctive characteristic of Pyle’s group of 
tests is that they were so devised that they could be given 
to whole classes of children at a time. The aim of Pyle’s 
test was to establish norms for children at different ages, 
and to measure mental and physical growth by giving the 
test successively to the same children, or to children of dif- 
ferent ages.. The test measured, for the most part, rather 
simple processes of memory and association. One of them, 
for example, was a simple substitution test similar to that 
used by Healy and Fernald; another was a test of memory 
span; a third, of word building; and a fourth of opposites. 
Pyle gives extensive tables of norms on these tests, but 
suggests no method by which the scores on the individual 
tests may be interpreted. No very practical use, therefore, 
has been made of his group, except that which was made by 
Pintner. 

Pintner adapted a number of Pyle’s tests, and added a 
completion test and an arithmetic test, so as to constitute 
a group of tests which could be given either individually or 
together. The scores in the various tests were made com- 
parable by expressing them in terms of percentile scores. 


1W.H. Pyle. The Examination of School Children. The Macmillan 
Company, 1913. 
*R. Pintner. The Mental Survey. D. Appleton & Co., 1918. 
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The scores thus expressed were then combined, and the 
combined score expressed in percentiles. This procedure 
made of the tests a composite scale. 

Another group of tests, which is not primarily a composite 
scale, but which can be combined by the percentile method, 
is the one devised by Woolley for use in the study of the 
mentality of working children. These tests were to be 
given to children who left school to go to work, and also to 
children who remained in school. The purpose was to study 
the mental characteristics of the working children, and the 
effect of their occupation upon their capacity. For this 
purpose a large variety of mental measurements were made, 
the aim being to secure, not primarily a composite measure, 
but a qualitative analysis of abilities. ‘The tests, therefore, 
included certain psycho-physical tests, as tests of strength 
of movement, of physical capacity, of rapidity of reaction; 
and tests of the various mental capacities, such as associa- 
tion, problem solving, memory, imagination, and so on. 
The tests were scored on a percentile scale. 

Another group of tests, which was also devised for a 
special purpose, was that used at Ellis Island by Knox.’ 
This group of tests, because it was to be used with immi- 
grants, was so designed as to avoid the use of language, either 
in presenting the tests or in responding to them. The group 
is composed of what are commonly called performance tests. 
The form board is one illustration of such tests. This con- 
sists of a board with openings of various shapes cut out of 
it, and blocks which must be fitted into the openings. A 
number of types of form boards were used. Another well- 
known test of this group is the so-called Knox Cube Test. 


1 Helen T. Woolley and Charlotte R. Fischer. Mental and Physical 
Measurements of Working Children. Psychol. Monog. vol. 18. 1914. 

2H. A. Knox. “A Scale, Based on the Work at Ellis Island, for Estimat- 
ing Mental Defects’’; in Journal of the American Medical Association, vol. 
42, pp. 741-47. 1914. 
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Fic. 4. Innusrration or THE Sure Test, ORIGINATED BY KNOX AND 
Usrp sy Pintner AND Paterson, AND LATER BY THE ARMY 


(Reproduced by permission of the author and of the publishers, 
D. Appleton and Company, New York.) 
The examiner places four cubes before the subject. He 
takes a fifth with which he taps the four cubes in a certain 
order. The examinee must then take the fifth cube and tap 
the four in the same order. A few of the tests of this group 
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were borrowed, but many of them were originated by Knox. 
They have been widely used. 

Similar in some respects to the Knox group of tests are 
the Pintner-Paterson Performance Tests.!_ These tests have 
been grouped together and called by the authors a scale. 
There are, in fact, four scales described in the book, but 
these are supplementary to a detailed and individual de- 
scription of the particular tests. The tests are independent 
of language, both in their presentation and in the perform- 
ance by the children. They are largely of the form-board 
type. Some of them have been borrowed from Knox, one 
was a standardization of the Healy Picture-Completion 
Test, and others have been taken from other sources. 
There are fifteen in all. An example of this series is the 
Ship Test, shown in Fig. 4. This test was later used in 
the Army Performance Scale Examination. Each test was 
given by the authors to a large number of children, and 
the results were tabulated in the form of age norms, or stand- 
ards. The tests are individual rather than group tests — 
that is, they are given to children singly rather than in 
classes. The series is useful as an individual test to be 
given to deaf children, to children who do not understand 
English, or to those who may have more ability to deal with 
things than with words. 


4. Profile tests 
At the beginning of this chapter, it was said that profile 
tests represent the analytical measurement of capacity. 
The purpose of test groups is also, as we have seen, the 
separate measurement of the various capacities. In such 
a group, however, provision is not made for bringing the 
measures of the various traits into direct and easy compari- 


1 R. Pintner and D. G. Paterson, A Scale of Performance Tests. D. Apple- 
ton & Co., 1917. 
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son with one another. The profile test represents the de- 
velopment of the test group so that a direct comparison may 
readily be made. The direction of development is opposite 
to that of the composite scale, in which the scores of all the 
individual tests are combined into a composite score. 

The idea of the profile test is not new, and various attempts 
have been made to put it into execution. It has been found 
more difficult in the execution than in the conception, how- 
ever. Yerkes, Bridges, and Hardwick proposed, in connec- 
tion with the description of their point scale, a much more 
comprehensive scale which would be of the profile type.! 
These authors suggested a scale of four main divisions. 
They suggested that the individual parts of such a scale 
should measure, respectively, receptivity, imagination (in- 
cluding memory), affectivity or feeling, and thought. Each 
one of these large divisions should contain subordinate 
divisions. Such a scale as this would be broader than a 
merely intellectual test. It would involve both the inven- 
tion and the standardization of a large number of tests 
which we do not at present possess. 

In contrast to this comprehensive plan, there are in ex- 
istence a few profile tests of a very narrow scope. In fact, 
they are even less comprehensive than a profile test which 
covers the range of intellectual capacity. The Downey 
Will Profile Test, for example, which will be mentioned in 
the chapter on tests of non-intellectual capacities, is made 
up of a series of tests, each one of which measures some 
aspect of overt behavior. The Seashore Music Test is an- 
other profile scale, which is described under “Vocational 
Tests.” This scale measures the various constituent ca- 
pacities which are necessary for musical appreciation and 
performance. 


1R. M. Yerkes, J. W. Bridges, and R. F. Hardwick. A Point Scale for 
Measuring Mental Ability. Warwick & York, 1916. 
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The general method of the profile scale, then, may be 
applied to the analysis of any group of mental capacities. 
It has been very seldom attempted in the general field of 
intellectual capacity. Only one organized and systematic 
attempt in this field has been made, and this attempt, as we 
shall see, is very far from meeting the demands of scientific 
standardization. We shall take this scale, however, as an 
illustration of the type which may possibly be developed at 
some future time for the purpose of making an analytical 
measurement of intellectual capacities. 

The scale which has just been referred to is the psycho- 
graph of Rossolimo.!' The author of this scale first makes 
his classification of the capacities which are to be tested. 
This classification may be given in tabular form as follows: 


1. Attention, four tests 
2, Will, two tests 


3. Perception, four tests 
4. Memory: 
II. Impression (a) five tests 
(b) ten tests 
(c) five tests 
5. Comprehension, two tests 
6. Construction or ability to combine, 
three tests 
. Skill in mechanics, one test 
8. Imagination, one test 
9. Observation, one test 


I. Tonus | 


III. Associative Processes 7 


There are thus thirty-eight individual tests, each one of 
which contains ten items. These thirty-eight tests are 
classified under three large heads and nine smaller divisions. 

Without giving a complete list of the tests, we may il- 
lustrate from one test in each division. Attention is tested 


1 Beryl Parker, “Psychograph of Rossolimo’’; in American Journal of 
Insanity, vol. 73, pp. 273-93. 1916. 
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by requiring the subject to insert a needle through a small 
hole in a card. Will is measured by the resistance to auto- 
matism. For example, the subject counts after the ex- 
aminer and is to stop when the examiner stops. He fails if 
he does not do so. Perception is tested by presenting the 
subject with a card having nine figures in front and one on 
the back. The subject must pick out the one on the back 
from the nine in the front. Memory is tested by requiring 
the subject to pick out ten figures which he has seen before 
out of twenty-five in which the ten are included. Compre- 
hension is tested by requiring the subject to detect an ab- 
surdity in a picture. Construction is tested by requiring 
the child to put together puzzle pictures. Skill in mechanics 
is tested by means of simple mechanical puzzles. Imagina- 
tion is tested by the completion of unfinished pictures, and 
observation by the interpretation of pictures. 

The psychograph of Rossolimo is presented not as a com- 
pletely organized profile test, but as an illustration of the 
type. The development of the profile test involves two large 
problems. The first problem is the satisfactory analysis of 
the capacities which are to be included in the test. Such an 
analysis may be made and probably must be made, to begin 
with, in part by the exercise of psychological insight and 
ingenuity on the part of the person devising the test. No 
analysis which is made merely as result of reflection, how- 
ever, is satisfactory as a final basis of classification. It is 
necessary that this analysis be verified and probably modi- 
fied by experimentation. We have already seen, in the dis- 
cussion of single tests, that the analysis which psychologists 
make is not likely to be completely successful. The capaci- 
ties which he first classes together may not be as closely 
related as others which he puts into separate classes. Ros- 
solimo’s classification offers suggestion for experimentation, 
but it has not yet been proven to be satisfactory. 


‘ 4 
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The second problem which confronts the deviser of a 
profile test is the selection of appropriate tests to measure 
each of the various capacities, and their individual standard- 
ization. A selection of the tests, of course, is related to the 
problem of the analysis of the capacities. One can only 
check up on the analysis by means of tests. The standard- 
ization of a test is a very laborious process. In the case 
of our scales, it is not necessary to standardize each test 
separately — that is, it is not necessary to find norms of 
performance or standards of performance in each one in- 
dividually. Norms or standards are found for the com- 
posite score on the entire scale. In case of the profile test, 
however, we must have separate norms for each test. Fur- 
thermore, some method of scoring each of the tests must 
be found such that the scores on all the tests will be com- 
parable. This has commonly been done by expressing the 
scores in terms of percentile rank. We shall explain this 
usage more in detail in discussing scales. 

Rossolimo’s psychograph was not standardized in the 
manner which has been described. Up to the time of the 
publication, it was given to only a very few individuals. 
We have, therefore, no norms. A long process of stand- 
ardization and revision would be necessary to make it 
usable. 

We shall probably not have satisfactory profile tests to 
cover the entire range of intellectual capacities until a con- 
siderable amount of work has been done in the standardiza- 
tion of individual tests. The energy of psychologists of 
recent years has been devoted largely to the development of 
scales of the composite sort. After work with single tests 
has prepared the way for the satisfactory analysis of mental 
capacities, and for the selection of tests to measure these 
capacities, we shall be in a position to proceed in the devel- 
opment of these various desirable forms of the scale. Since 
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the further development of specialized tests is a step in the 
direction of the construction of profile, we shall review a few 
attempts to work out such specialized tests. 


5. The development of specialized tests of intellectual capacity 


We have already seen that the construction of profile 
tests depends upon the previous development and standard- 
ization of specialized tests. We have now to consider briefly 
what this development involves, and what the relationship 
is between these specialized tests and the single tests which 
constituted the subject-matter of the testing work in the 
earlier period. Before entering upon the details of this dis- 
cussion, we shall make a preliminary distinction between 
two types of specialized tests. The first is specialized with 
reference to the activity of a particular vocation. Such a 
test measures the activity, whether simple or complex, and 
whether it involves one or several mental functions which are 
required to perform a particular set of activities demanded 
in a vocation. Such a specialized test is illustrated by the 
Muensterberg test for street-car motor-men. This and 
other specialized vocational tests will be discussed in the 
chapter on “The Application of Tests to Vocational Selec- 
tion.” The type of specialized tests:which we are dealing 
with here, on the other hand, is defined by the fact that each 
aims to measure a single mental function. 

We have already seen that the tests of the earlier period 
aim to measure particular mental functions. We have seen, 
furthermore, that the results of the correlation between 
these tests render very difficult the interpretation of the 
mental functions which they measure. It appears to be 
necessary, therefore, to carry on a thorough study of the 
various tests in order that we may be able to identify the 
functions which are measured by them. The intelligence 
tests have side-stepped this problem, and have simply set 
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as a criterion the correlation with other tests in general, or 
with general measures of ability. They have not attempted 
to find specialized measures of ability. 

A good deal of experimentation has been carried on in 
recent years in the development of single tests. This ex- 
perimentation differs from the studies of single tests in the 
early period, particularly in its emphasis upon tests of a 
more complex nature and in its elaboration of the test 
materials. 

In the field of sensory tests, there has been little recent 
experimentation. The technique of testing sensory dis- 
crimination is probably thoroughly adequate. The work of 
Pillsbury, Seashore, and Yerkes and Watson, sponsored by 
the American Psychological Committee of 1906, still repre- 
sents the most advanced technique in sensory tests. What 
we now need is further study of the interrelationships be- 
tween the tests in the same sensory field in order to deter- 
mine to what extent the discrimination of various types of 
stimuli within the same sense are specialized, and to what 
extent they represent general discriminative ability. 

In the field of motor capacity there has been greater 
activity. If we include as a phase of motor capacity sen- 
sory-motor reaction, we find that there has been some in- 
vestigation both in the field of vocational testing and the 
general analysis of this sort of capacity for its own sake. 
The ability to carry on at the same time and to coérdinate a 
series of parallel reactions was measured by a device which 
was first designed for use in the aviation corps of the army.! 
The apparatus provided three different sets of signals, any 
one of which may be set in operation at any time and which 
require three sets of responses which the individual was to 
perform in response to these signals. It demanded a con- 


1 Knight Dunlap. Report of Air Medical Service, p. 300. Washington, 
19l9: 
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tinuous attentiveness to the three sources of stimulation. 
This device was designed, in the first place, to measure the 
deterioration in ability to respond which resulted from a 
gradual diminution in oxygen content of the air. It was not 
used to compare the abilities of various individuals with one 
another. It might, however, be used for this purpose. 

A test which measures continuous reaction is the Pursuit 
Meter.! This is a device composed of various electrical in- 
struments which measures the ability of the individual to 
adjust his movements to a series of constantly changing 
stimuli. The object to which the subject is to adjust his 
movement consists of a spot on a dial which moves behind 
a line. Through a change in the electric current, this spot 
moves toward one side or the other side of the line. The in- 
dividual, by adjusting a rheostat, tries to bring the spot 
back to the line when it moves away from it. The apparatus 
measures the accumulation of the errors in the individual’s 
attempt to keep the spot on the line. 

Somewhat related to these reaction tests are the various 
form-board tests. The Seguin and Dearborn form boards 
have already been mentioned. Sylvester conducted an ex- 
tensive experiment in the standardization of the Seguin form 
board for various ages.2._ Another rather elaborate system 
of form boards has been devised by Ferguson.* Several 
tests which are objectively similar to form-board tests are 
construction tests. T. L. Kelley 4 has a test in which he 
provides children with blocks of various shapes and asks 
them to make a series of objects: The quality of the product 


1 Walter R. Miles, ‘The Pursuit Meter”; in Journal of Experimental 
Psychology, vol. 4, pp. 77-105. 1921. 

2 R.H. Sylvester. The Form Board Test. Psychol. Monog., vol. 15. 1915. 

5G. O. Ferguson. Journal of Experimental Psychology, vol. 3, pp. 47-58. 
1920. 

4 T. L, Kelley, “A Gonstructive Ability Test”’; in Journal of Educational 
Psychology, vol. 17, 1-16. 1917. 


ANALYSIS OF MENTAL CAPACITY 125 


is tested by a comparison with the series of standard photo- 
graphs. 

Stenquist’s Mechanical Aptitude Test is also designed to 
measure special capacity.!_ There are two forms of the test, 
one consisting of objects of everyday use, such as a sash 
fastener, or a hinge, which are presented to the subject in 
parts and which he is required to assemble. The other form 
is a paper test in which one is required to designate which of 
several pairs of drawings represent parts of the same object. 
The test measures something different from general intelli- 
gence, but it is not yet demonstrated just what special 
capacity it does measure. 

A rather widely used maze test, devised by Porteus,? is 
thought by its author to measure foresight or circumspec- 
tion. The mazes are reproduced in Fig. 5. The subject is 
directed to trace the shortest line from the entrance, at S, to 
the other opening. This test correlates rather closely with 
an intelligence test, and the opinion that it measures a special 
capacity is based on interspection and analysis rather than 
on statistical evidence. 

Kohs * has a test which has been elaborately worked out 
and which requires the child to copy a pattern by putting 
together blocks of various colors. Kohs believes that his 
test measures the higher intellectual processes, and defines 
these processes as consisting of analysis and synthesis. He 
finds that his test correlates well with general tests of in- 
telligence. This, of course, raises the question whether the 
test is really a specialized or a general test. 


1J. L. Stenquist, Measurements of Mechanical Ability. New York: 
Teachers College, Columbia University, 1923. 

2S. D. Porteus, “Motor Intellectual Tests for Mental Defectives”’; in 
Journal of Experimental Pedagogy, vol. 3, pp. 127-35 (1915); and “The 
Measurement of Intelligence: Six Hundred and Fifty-Three Children 
Examined by the Binet and Porteus Tests”; in Journal of Educational 
Psychology, vol. 9, pp. 13-31. 

3S. C. Kohs. Intelligence Measurement. New York: The Macmillan 
Company, 1919. 
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n= A ees 


Fic. 5. Specimens or THe Porrrus Mazen Test 
(Copied by permission of the C. H. Stoelting Co.) 


As final examples of tests of special capacities, or tests 
which aim to measure special capacities, may be mentioned 
tests which are designed to measure reasoning capacity. 
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Burt published, some years ago, a series of tests which con- 
sist of verbal problems, such as those which are used to 
illustrate logical operations.!. Thurstone has a somewhat 
similar but more formal syllogism test.? 

Such tests, like all those which have been mentioned, 
contribute little to the solution of our problem until we are 
able to identify the mental processes which are measured by 
them. We find upon investigation that frequently tests 
which seem objectively to be similar are really different, and 
that tests which seem to be different really measure similar 
mental processes. We may illustrate the mode of attack 
upon the problem with two examples. 

The first example is in the field of memory tests. In 
the author’s laboratory, during the past few years, there has 
been experimentation with memory tests. The procedure 
was to collect a half-dozen or more tests of memory, to 
standardize the procedure of giving these tests, and then to 
study their intercorrelation and their correlations with other 
tests. We are accustomed to distinguish in our thinking 
between rote memory, or the memory of objects which are 
not logically related to one another, and logical memory, or 
memory of sense material. One problem, therefore, is to 
determine whether the two types of memory are really dis- 
tinct. Another problem is to determine whether the various 
measures of rote memory measure the same thing, and 
whether the various measures of logical memory measure 
the same thing. Finally, we must face the question whether 
the tests of memory, taken as a whole, can be said to measure 
a capacity which is distinct from tests of other sorts. 

Before we can answer any of these questions, it is neces- 


1 C. Burt. “Development of Reasoning in School Children”’; in Journal 
of Experimental Pedagogy, vol. 5, pp. 68-77. 1919. 

2L.L. Thurstone. Syllogism Test A. Division of Applied Psychology, 
Carnegie Institute of Technology. 
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sary, aS was seen in our earlier discussion, to determine 
whether our tests are consistent with themselves. Many 
cases of seeming lack of correlation between tests are due to 
unreliability in this respect. 

The preliminary experiments in memory which have been 
mentioned have not brought us to a final conclusion con- 
cerning these questions. We may say tentatively, however, 
that when we have found tests of a fair degree of reliability 
we are able to distinguish between rote memory and memory 
of sense material. This is shown by the fact that the tests of 
rote memory correlate more closely among themselves than 
they do with tests of logical memory. The distinction, 
however, is not a sharp one. There seems to be a re- 
lationship between all of the memory tests, which indicates 
that we are measuring to some degree the same capacity. 
On the other hand, the correlations between the various 
rote memory tests, or the various logical memory tests, are 
not so high as to warrant us in concluding that we are 
measuring precisely the same thing by the various tests. 
Each test, in other words, is to some extent specialized, and 
different memory tests measure to some extent different 
things. Furthermore, when we correlate the memory tests 
with tests of general intelligence, we find that there is con- 
siderable relationship between memory and general mental 
capacity. Our memory tests, therefore, are not completely 
specialized with reference to general mental ability. This 
suggests immediately the question of the constitution or the 
nature of general mental ability, which we shall have to 
consider more particularly in a later chapter. 

We may close with one further illustration of the problem 
which confronts us in the development of specialized tests 
of some one mental capacity. A study of motor abilities 
which was made by Perrin ! indicates the complexity of the 


1. A. Perrin. ‘Experimental Study of Motor Ability”; in Journal of 
Experimental Psychology, vol. 4, pp. 24-56. 1921. 


ANALYSIS OF MENTAL CAPACITY 129 


problem. One might suppose motor ability to be a fairly 
homogeneous affair. We commonly speak of persons as 
having a high degree or a low degree of manual skill, or skill 
of movement. It is commonly supposed that the ability 
to master a skilled operation varies among different persons, 
and is rather general in its nature. In order to test this 
assumption, Perrin gave to about fifty persons three com- 
plex motor tests and fourteen simple tests. The complex 
tests were, first, the Bogardus fatigue test, which requires 
that a person place a block on a rotating platform; second, 
a card-sorting test, which requires that cards be sorted into 
compartments or piles according to some mark upon them; 
and third, a new motor-codrdination test, which requires 
that a person shall trace simultaneously a square with one 
hand and a triangle with another. The fourteen simple 
tests were of the conventional sort. 

Contrary with what we might expect, there was found to 
be very little correlation between these various motor tests. 
Even the complex tests, on the whole, correlated slightly 
with each or with the simple tests. In the light of these 
results, Perrin goes on to discuss the question, “What is 
motor ability?”’ He inquires what the evidence indicates 
as to whether there is such a thing as general motor ability; 
whether it is a complex or simple unit function; whether it is 
based on a few general modes of reaction; whether it is closely 
related to intelligence or to temperament. He does not come 
to a definite conclusion, except that motor ability is not 
general, nor a complex of simple unit functions or a few 
modes of reaction. 

There is one condition which we have found necessary to 
satisfactory data upon problems such as these, and which 
was not met in Perrin’s study. This condition is the 
measurement of the reliability of individual tests, or the 
extent to which each one is consistent. The author’s experi- 
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ence has been that sometimes tests which one would suppose 
to be thoroughly consistent turn out not to be so. Experi- 
ments which he has made with a tracing test, for example, 
prove that this test, although carefully carried out and ad- 
ministered, may have very little reliability. When the re- 
liability was first measured, it was found to be less than .30. 
By altering the conditions in the administration of the test, 
it was raised to about .60. If, now, Perrin’s tests, as is 
quite possible, had very low reliability, this may have ac- 
counted for the low intercorrelation. 

The examples which may have been given may at least 
serve to indicate something of the nature of our problem and 
of the complications in which it is involved. The investi- 
gation of the feasibility of specialized tests is one of the 
major problems of the future development in this field. It 
can probably best be attacked by an intensive study of 
certain of the aspects of mental capacity which we are 
accustomed to regard as fairly distinct and definite, such as 
motor ability and memory ability. This study may result 
in the revision of our conception concerning the classification 
of mental capacities, or perhaps in the development of an 
entirely new system of classification. 


CHAPTER VI 
THE EARLY DEVELOPMENT OF POINT SCALES 


1. The first point scale 


Wai the development of point scales, as has already been 
said, was very largely influenced by the studies of correlation, 
the first point scale was a direct outgrowth of the Binet age 
scale. This was the scale which was developed by Yerkes, 
in association with Bridges and Hardwick.'! The scale is 
composed of twenty tests, nineteen of which are taken from 
the Binet scale. The actual number of tests is somewhat 
greater than is indicated by this statement, since each one is 
composed of a number of parts. Thus the test of memory 
span for digits is composed of ten parts, or five pairs of in- 
creasing length and difficulty. The first pair contains three 
digits each and the last, or most difficult pair, seven digits. 
In the same way each test contains a short, graded series. 
In general the easier tests are in the first part of the scale 
and the more difficult ones in the later part, but there is not 
a regular gradation in the difficulty of the successive tests. 
There is, rather, a gradation in the difficulty of the parts 
within each test. 

The tests are not arranged according to age, as we have 
seen, nor are they scored in terms of age. The various forms 
of the Binet scale are scored by giving credit for passing each 
test consisting of a fraction of a year’s mental age. Thus, 
in the Stanford Revision, two months, or one sixth of a 
year’s credit, is given for passing each test. In the point 
scale, on the other hand, the child is not given credit directly 


1R. M. Yerkes, J. W. Bridges, and R. S. Hardwick. A Point Scale for 
Measuring Mental Ability. Baltimore: Warwick & York, 1915. 
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in terms of mental age, but in terms of points. Thus for 
each part of the test in memory span for digits the child is 
given one point. The possible score for this test is five 
points. 

This method of scoring seems to constitute rather a differ- 
ence in method than in principle, for the point scores are 
interpreted by comparing them with a table of age standards. 
Thus, if a child makes a point score of 58 out of a possible 
100, his mental ageisten. If his score is 70, his mental age is 
twelve. The authors of the point scale criticized the Binet 
scale because it assumes that each stage of mental develop- 
ment corresponds to a certain critical age, and that there is a 
“correlation between the different functions at different 
stages of development.” It seems that any scale which in- 
terprets the scores in terms of age standards assumes this 
correspondence in the same fashion as does an age scale, and 
this is true of all of our point scales which are designed for 
children. After the child’s score has been referred to the 
table of age norms, his brightness score may be calculated in 
a slightly different way from that which is done with the 
Binet scale, as we shall see in a moment, but the fundamental 
conception is the same.! 

Another point which the authors make in favor of the 
point scale is that it uses the method of partial credits in 
scoring, as distinguished from the all-or-none method. This 
point has some justification, but the partial-credit method 
can also be applied, and is applied, in a measure, in the age 
scale. By the all-or-none method is meant that method of 
scoring in which the child is either given full credit or no 
credit at all in the test. By the partial-credit method is 


' For a fuller discussion of the relation between point scales and the Binet 
scale, see F, N. Freeman, ‘‘A Critique of the Yerkes-Bridges-Hardwick 
Comparison of the Binet-Simon and the Point Scales”’; in Psychology Review, 
vol. 24, p. 484. 1917. 
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meant one in which the child is given some credit if he passes 
a part of the test, and an additional credit if he passes an- 
other part. This is illustrated in the memory test which has 
already been alluded to. Now it happens that, in this 
memory test, partial credit is also allowed in the Binet scale, 
although by a different procedure. Thus, in the Stanford 
Revision, a child is given credit for passing one test at mental 
age three if he repeats three digits, one at mental age four if 
he repeats four digits, at mental five for five digits, ten for six 
digits, fourteen for seven digits, and eighteen for eight digits. 
The same sort of distribution of graded items of the test at 
different mental ages appears repeatedly in the age scales. 

While the fundamental principle of the point scale, there- 
fore, is not to be found in its difference from the Binet scale, 
there are some characteristics which commend it from the 
point of view of convenience. It is easier to revise the 
norms of the point scale, and it is not essential that every 
test be given a separate age standardization. A tentative 
series of age standards may be derived from the application 
of the test to a small number of children, and then, after the 
tests have been applied to a larger number, the age stand- 
ards may be changed, if necessary, in accordance with the 
accumulation of scores. Furthermore, if it is desirable to do 
so, it is possible to have different norms for different groups, 
such as groups belonging to various races or different social 
environments. It is possible to do the same thing with the 
age scale by applying a correction to the I.Q., but it involves 
a clumsier procedure. 

The point-scale method of organization is also easier to 
apply in the development of scales for the measurement of 
other kinds of mental capacity, such as feeling, or will, or 
moral attitude. In general it is a more flexible type of or- 
ganization than the age scale, and is the one which now pre- 
vails. 
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Finally, the point scale has the advantage of being easier 
to administer than the age scale. It does not require much 
study to gain a knowledge of the method of presenting it and 
of scoring it. 

The method of reckoning the relative intellectual capacity 
or the brightness of the child with the Yerkes Point Scale 
is somewhat different from that used with the age scale. In 
the age scale, as will be remembered, the child’s brightness is 
found by finding the ratio of his mental age to his chrono- 
logical age. This means that the child’s performance is 
compared with the performance of other children of other 
ages. In the point scale, on the contrary, the child’s per- 
formance is compared only with that of other children of 
his own age. This is done by finding the ratio of the child’s 
score to the average score of children of his own age. This 
ratio is called the Coefficient of Intelligence. 'The Coefficient 
of Intelligence, like the I.Q., is 1.00 for the normal or average 
child, above 1.00 for the superior child, and below 1.00 for the 
inferior child. Just what the relationship is between the 
distribution of these two ratios has never been worked out. 
We cannot assume that an I.Q. of 120, for example, means 
exactly the same as the Coefficient of Intelligence of 1.20. 

The Yerkes Point Scale, like the Stanford Revision of the 
Binet scale, has had very wide use in public schools for mak- 
ing individual examinations. Both of these tests, further- 
more, were used for individual examinations of men of 
low-grade intelligence in the army. Probably the greatest 
importance of this scale, however, is its influence on the sub- 
sequent development of tests, including the army group test. 

The edition of the original book describing the Point 
Scale is now exhausted and a revision of the book has been 
published,! containing an account of the first scale, the Pre- 


1R. M. Yerkes and Josephine Curtis Foster. A Point Scale for Measuring 
Mental Ability. 1923 Revision, Baltimore: Warwick & York, 1923. 
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Adolescent Scale, with minor changes, and also an account of 
two additional scales, the Adolescent-Adult Scale and the 
Infant Scale. 


2. The Herring Revision 

The Herring Revision of the Binet test has about the same 
relationship to the original from which it was derived as has 
the Yerkes Point Scale.! This is also a point scale in that the 
child is given a specified number of points for each test which 
he passes, and his mental age is calculated by comparing the 
total number of points to his credit with a series of ages 
standards. The scale is made up mostly of tests which are 
derived from the Binet scale, but these are supplemented 
by several new tests. There are thirty-eight in all. 

The chief novelty in the scale is its mode of organization. 
The tests, instead of being arranged in a single series, are 
placed in five groups. The first group may be used alone 
and constitutes a very brief test. The test may be ex- 
tended by adding the second group to the first, and so on. 
If any of the groups beyond the first one are used, a scheme 
is given according to which one may omit part of the tests to 
avoid duplication of levels of difficulty. If the child makes 
a relatively high score on the first group, the earlier and 
easier tests of the second group are omitted. On the other 
hand, if he makes a relatively low score, the later or more 
difficult of the tests of the second group are omitted. The 
same procedure is followed in the succeeding groups. 

The scale is a simple one to administer and to score. The 
entire directions for giving and scoring, including the table 
of norms, are included in a small book of fifty-six pages. The 
scale constitutes an individual test, as in the case of the 
Binet Scale and the Yerkes Point Scale. It requires less 


1 John P. Herring. Herring Revision of the Binet-Simon Test; Examination 
Manual, Form A. World Book Company, 122. 
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preparation, however, and, if it proves to be as reliable, it is 
probably preferable to the more cumbersome and longer in- 
dividual scales. The intelligence quotient is calculated in 
the same way as in the case of the Stanford Revision. 


3. The United States Army mental tests 


At the time that the American Psychological Association, 
through its president, Dr. Yerkes, and its council, offered its 
services to the United States Army in the prosecution of 
the War, and proposed to organize intelligence tests to be 
given to the army recruits, the chief tests which were in use 
were the individual age scale and the individual point scale. 
A considerable number of test groups had been organized, 
but these were usually not employed extensively except by 
their originators. A few tests had been administered to 
groups, but no well-organized group scales had been devised. 

The group of army psychologists, who, after a period of 
experimentation, were entrusted, under the direction of Dr. 
Yerkes, with the organization of the intelligence examina- 
tions, realized that it would be necessary, in order to ad- 
minister tests on a large scale, to develop a group test. It 
appears to the lay observer that these psychologists created 
out of whole cloth radically new methods of examining. On 
the contrary, they made use of all the earlier experiments 
with tests which we have reviewed, including the studies of 
correlation, and simply took the next logical step in ad- 
vance. This step was taken more quickly than would other- 
wise have been the case, and the mental-test movement ac- 
quired a tremendous impetus as a result of the large number 
of examinations which were given in the army and of the 
publicity which it received. One psychologist, Otis, how- 
ever, was on the point of taking this step himself at the time 
the army tests were organized, and he contributed his experi- 
ence and his plan to the army psychologists. 
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The scales which were principally employed in the army 
were five. Two were scales which we have already de- 
scribed — the Stanford Revision and the Yerkes Point Scale. 
Two of the others were group tests, and one was an indi- 
vidual performance test. 


4. The Army Scale Alpha 


The most widely used of the army scales was Scale Alpha. 
This was a group test which was suitable for administration 
to men who could understand and could read English. To 
those who could not understand English because of foreign 
origin, or because they were illiterate, or because they were 
mentally defective, was given a second group test which did 
not involve the use of language. It consisted of a variety 
of pictures and diagrams. ‘The directions were given by 
pantomime. The men who failed to make a certain score on 
this second test, which was called Beta, were given an in- 
dividual examination. The individual examination was 
either one of the two which have been mentioned — the 
Stanford-Binet or the Yerkes Point Scale — or a fifth test 
which was an individual performance test. These various 
tests may be described in a little more detail. 

On account of the historical importance of Scale Alpha, 
and because it stands as the type of our group-point scales, 
we may reproduce it almost in full. 

Successive tests are on alternate pages, and tests 5 to 8 are 
printed upside down in order to prevent the men from look- 
ing forward to a new test until the signal is given.! 


1¥or a brief description of the army tests, and the manual for giving 
them, see Clarence S. Yoakum and Robert M. Yerkes. Army Mental Tests. 
Henry Holt & Co., 1920. 

For a full technical account of the derivation of the tests, of the details of 
the tests themselves, and of the results of the applications in the army, see 
the official report entitled Psychological Examining in the United States 
Army, edited by Robert M. Yerkes, vol. 15, National Academy of Sciences, 
Washington, 1921. 
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It will be seen that the scale consists of eight tests. 
Test 1 is a so-called directions test. Each item is to be 
marked by the examinee according to directions to be given 
by the examiner. For example, the directions for the first 
item of test 1, form 6, are as follows: 

- “Attention! Attention always means pencils up. Look at the 
circles at one. When I say ‘Go,’ but not before, make a cross in 


the second circle and also a figure one in the third circle. ‘Go!’” 
(Allow not over five seconds.) 


The later items of the test are more difficult than the earlier 
ones. For example the directions for item 12 are as follows: 


“Attention! Look at twelve. If six is more than four, then, 
when I say ‘Go,’ cross out the five, unless five is more than seven, 
in which case draw a line under number siz. ‘Go!’” (Allow not 
over ten seconds.) 


This test is what might be called a test of the mental span, 
or the ability to keep in mind a number of things at once. 
It serves also as a means of determining whether the men 
understand verbal directions, and as a means of weeding out 
those who do not understand English. 

Test 2 is simply a series of arithmetic problems. This 
might seem at first glance to be merely an educational 
test. It does, of course, require that the individual shall 
have had instruction in arithmetic. It was assumed, how- 
ever, that all of the men being examined had had sufficient 
instruction to solve these problems if they had the mental 
capacity to do so. This was probably true for most men, 
but it was not true for all, particularly some of the foreign- 
born. 

Test 3 is called a test of common sense. It is assumed 
that every person examined has had the experiences which 
will enable him to give the correct answer to the questions, 
provided he has ordinary intelligence. 


Form 6 GROUP EXAMINATION ALPHA Score. .Rating.. 
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TEST 2 


Get the answers to these examples as quickly as you can 
Use the side of this page to figure on if you need to. 


1 How many are 5 men and 10 men?..: Answer ( 15 ) 
SAMPLES; 2 If you walk 4 miles an hour for 3 hours, how 


Pando youswalkren sti © ace oas Answer ( 12 ) 
1 How many are 40 guns and 6 guns?............. Answer ( ) 
2 If you save $6 a month for 5 months, how much will you 
ONT claw se Scar SERS RES BC ALS OS ORS ee Picea “Answer ( ) 
3 If 32 men are divided into squads of 8, how many 
Sua Gsawill here: Dens2 emir eee es toast cision Answer ( ) 
4 Mike had 11 cigars. He bought 3 more and then smoked 
6. How many cigars did he have left?............ Answer ( ) 
5 A company advanced 6 miles and retreated 3 miles 
How far was it then from its first position?....... Answer ( ) 
6 How many hours will it take a truck to go 48 miles at the 
TaLeot 4miles anynOure-<acna-e paey tara eases Answer ( ) 
7 How many pencils can you buy for 40 cents at the rate 
OMe Ore DVCOnUs ha cit a: ay VRC pee es Answer ( ) 


8 A regiment marched 40 miles in five days. The first 

day they marched 9 miles, the second day 6 miles, the 

third 10 miles, the fourth 9 miles. How many miles did 

Chey oMaLehatne last Ga Vil.cuaria ee ery era yale Answer ( ) 
9 If you buy 2 packages of tobacco at 8 cents each and a 

pipe for 55 cents, how much change should you get from 


ant WOrdO War DULG. torcit Ue tevtecete eee a eae ea ene Answer ( ) 
10 If it takes 8 men 2 days to dig a 160-foot drain, how 
' many men are needed to dig it in half a day?..... Answer ( ) 


11 A dealer bought some mules for $900. He sold them 
for $1,000, making $25 on each mule. How many 


TIDEESR WEEE CHETOT terse gir crorare cuca ee ae ARSE a oe Answer ( ) 
12 A rectangular bin holds 600 cubic feet of lime. If the bin 
is 10 feet wide and 5 feet deep, how longisit?...... Answer ( ) 


13 A recruit spent one-eighth of his spare change for post 

cards and four times as much for a box of letter paper, 

and then had 60 cents left. How much money did he 

IDAUVETACHITSEL ES toe Ce Ln CMR aaies Se ee Ons Answer ( ) 
14 If 2% tons of hay cost $20, what will 41 tons cost?. . Answer ( ) 
15 A ship has provisions to last her crew of 600 men 6 

months. How long would it last 800 men?........ Answer ( ) 


16 If a train goes 200 yards in 10 seconds, how many feet 
doestiti gol ina) fifth’ of a secondt. 0). 6. a... 52 Answer ( ) 
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TEST 3 


This is a test of common sense. Below are sixteen questions. Three 
answers are given to each question. You are to lock at the answers 
carefully; then make a cross in the square before the best answer to 
each question, as in the sample. 


yy do we use stoves? Because 
they look well 
SAMPLE they keep us warm 

L] they are black 


Here the second answer is the best one and is markéd with a cross. 
Begin with No. 1 and keep on until time is called. 


1 If plants are dying for lack of | 9 Why are warships painted 


rain, you should gray? Because gray paint 
[] water them [] is cheaper than other 
L] ask a florist’s advice colors 
[s] put a fertilizer around L1 is more durable than other 

them colors 

L] makes the ships harder to 
2 A house is better than a tent, see 

because 


10 Why should all parents be 
made to send their children 
to school? Because 


L] it prepared them for adult 


L] it costs more 
it is more comfortable 
LC] it is made of wood 


3 Why does it pay to get a good life 
education? Because L] it keeps them out of mis- 
(1 it makes a man more use- chief 
ful and happy L] they are too young to 
work 


(1 it makes work for teachers 
CJ it makes demand for 


buildings for schools and|1!1 The reason that many birds 


sing in the spring is 


colleges 
CL] to let us know spring is 
4 If the grocer should give you here 
too much money in making L] to attract their mates 
change, what is the right L] to exercise their voices 


thing to do? ; 
L] buy some candy of him|12 Gold is more suitable than 
iron for making money be- 


with it 
{al give it to the first poor cause 
man you meet L] gold is pretty 
C1 tell him of his mistake CL] iron rusts easily 


C1 gold is scarcer and more 
BES” Go to No. 9 above valuable . 
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None 


If the two words of a pair mean the same or nearly the same, draw a 


line under same. If they mean the opposite or nearly the opposite, 


draw a line under opposite. If you cannot be sure guess. The two 
samples are already marked as they should be. 
SAMPLES| good SS Glas 250 pa eee. same — opposite 
little — small............. .Same — opposite 

EY rcold’—"hot-) eens seis cote same — opposite 1 

Se long-— Short, casero Hee same — opposite 2 

§ ‘bare — naked ss - ese same — opposite 3 

4° joy — happiness... hee same — opposite 4 

DimeeliT Cl LOSGt arc eye an eee ere same — opposite 5 

Greshrill—— sharp)... sc cect eos same — opposite 6 

if. WOTHUS. =—- PIS sess cs Se EO same — opposite 7 

Seer ELT SLODEN 255 ret ee eee same — opposite 8 

9 careless — anxious............ same — opposite 9 
LOmrerude--— COAlSec ss tee ere aren same — opposite 10 
11 commend — approve.......... same — opposite 11 
£2 dinver:—— loiter... <3. hola esl same — opposite 12 
Season yic— DIISSty a2 scree cena eres same — opposite 13 
14 defective — normal........... same — opposite 14 
15 competent — qualified......... same — opposite 15 
16 —knave— villain. os) ccc0 00 ea same — opposite 16 
eer al OLD a vatedsne ysl ucnon oleate same — opposite 17 
UNS ee WAX 7 WANG ar. a Hc, sneer ce viey nde dee same — opposite 18 
19 adversary — colleague.........same — opposite 19 
20 altruistic — egotistic.......... same — opposite 20 
QTR IRELV Cs — SLY Ab esacts, Wales gem eles same — opposite 21 
22 ON a—— NOME. avai sieee heeds same — opposite 22 
93) asunder —= apart..ia.e ie aaa same — opposite 23 
24 deplete — exhaust............same — opposite 24 
25 superfluous — essential........ same — opposite 25 
OGmeTeCOUDL—— TCCOVEL ere cic sL omnes same — opposite 26 
27 celibate — married........... same — opposite 27 
28 recant — disavow..........:.. same — opposite 28 
29 avarice — cupidity............ same — opposite 29 
30 aggrandize — belittle..........same — opposite 30 
31 decadence — decline........... same — opposite 31 
S27 snullify — annul). .iscciis cs «4 same — opposite 32 
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Test 4 is a measure of the ability of the individual to 
apprehend the relationship of sameness and oppositeness 
of meaning in words. It is assumed that the persons tested 
know the meaning of the words. It is obvious, for the latter 
part of the test at least, that the test is a measure of the 
understanding of vocabulary as well as a measure of the 
ability to give opposites. 

Test 5 is a measure of the ingenuity of an individual as 
indicated by his ability to rearrange words and make them 
‘into asentence. To some extent also, of course, it isa meas- 
ure of information, as in the case of item eighteen. 

Test 6 is again a measure of ingenuity, this time in the 
field of number. It is probably more nearly a pure intelli- 
gence test than are some of the others. 

Test 7 is a measure of the ability to see relationships. 
Assuming that the information demanded is at the command 
of all those who are examined, it has proved to be a good in- 
telligence test. It is called the analogy test, and is the one 
which was introduced by Yerkes in his point scale in con- 
trast to those which were borrowed from the Binet scale. 

Test 8 has been criticized as measuring experience rather 
than intelligence, but the possession of the information 
which is demanded by it, assuming the environment of the 
persons tested to have been fairly similar, is regarded as a 
fairly good measure of intelligence. 

Each of the tests has a time limit. The various items of 
test 1, for example, are given from five seconds to twenty- 
five seconds. The remaining tests have the following time 
allowances: No. 2, five minutes; No. 3, one and one half 
minutes; No. 4, one and one half minutes; No. 5, two min- 
utes; No. 6, three minutes; No. 7, three minutes; and No. 
8, four minutes. The time limits are so set that but a small 
percentage, approximately five, shall be able to finish the 
test. The score which an individual makes, therefore, de- 
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pends in part upon his speed of performance. However, the 
tests are not purely speed tests, as are sometimes assumed. 
In the first place, the tests increase in difficulty. so that if a 
person’s mental capacity is very limited, he begins to slow 
down sooner than he otherwise would. In the second place, 
the rapidity of a person’s performance depends in part upon 
the ease with which he can perform the tasks which are set to 
him. 

The examinee is given one point credit for every item 
which he answers correctly. Since there are 212 items in all 
of the tests taken together, the highest possible score is 212. 
Scores of 212 have been reported, but it is very rare that an 
individual scores above 200. 

Detailed directions for scoring the tests are given in the 
manual. One procedure deserves comment. It will be 
noticed that in tests 4 and 5 there is an even chance of 
giving a correct answer if one merely guesses. In each test 
the examinee is directed, “If you cannot be sure, guess.” 
It is assumed that the examinee will guess on some of the 
items, and that, upon some of the items on which he guesses, 
he will obtain a correct answer. This would give him a 
higher score than he would have if the score is intended to 
represent only those items to which he knows the answer. 
A correction is therefore applied to the scores on these tests. 
The correction assumes that the examinee has made as many 
correct answers by guessing as he has given wrong answers. 
His score is, therefore, found by subtracting the number of 
wrong answers from the number of right answers he has 
given. We shall examine the validity and usefulness of this 
procedure in one of the chapters on technique. 

There has been a good deal of discussion of the results of 
the Army Alpha Scale in terms of the letter rating into which 
the scores were translated. Some have spoken of the letter 
ratings as though they represented distinct and clearly de- 
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finable levels of mental capacity. Thus all those who re- 
ceived a grade below C have been spoken of as mentally 
defective. One writer, on the other hand, has implied that 
the rating A is a purely arbitrary designation, and explains 
his contention in this way: The timing of the tests, he 
writes, was so adjusted that five per cent of the men could 
finish. ‘To the men who finished were given the grade A. 
It is therefore due solely to this arbitrary selection of a time 
limit that approximately five per cent of the men received 
this grade.! As a matter of fact, five per cent of the men did 
not finish the test as a whole, and the score did not depend 
merely upon the number of tests which were attempted, but 
on the number which were correct. The distribution of the 
scores among the various letter grades was made in a totally 
different way. 

The scores which were assigned to the various letter 
grades were so adjusted as to give a distribution of the letter 
grades approximating the normal distribution. We shall 
see how this works out by examining the scores to which the 
various letter grades were given, and the distribution of the 
scores of the men who received these grades. 


Tasie VILL. Lerrer Ratines 1n Army ALPHA 


Letter rating E and D— D C-— iC C+ B A 
Limit of scores 0-14 15-24 25-44 45-74 75-104 105-134 135-212 
Range of scores 14 9 19 29 29 29 IK 
Per cent of principal draft 

receiving these scores.! 7.1 17.0 23.8 25.0 1522 8.0 4.1 


1See pages 422 and 800 of the Army Report. 


The fact that 4.1 per cent of the men received grade A, 
therefore, is no mystery. This number received this grade 
because the range of scores was set at such a point that ap- 
proximately this number would receive it. It will be no- 
ticed that the range of scores for grade A was 77, whereas 


1 Walter Lippmann. “The Mystery of the A Men”’; in New Republic, 
vol. 32, p. 248. 1922. 
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that for grade D was only 9. This is because the scores 
piled up at the lower end and were much rarer at the higher 
end. The larger percentage of scores will be seen to be in 
the divisions at the middle of the scale, and the smaller per- 
centage toward the extremes. If the distribution were en- 
tirely normal, there would be an equal percentage in the 
corresponding divisions ranging from the middle toward the 
extremes. It is customary to distribute marks or scores 
in this fashion, and the arrangement of the scores so that 
they will be so distributed simply means that it is assumed 
that intellectual capacity occurs among an unselected group 
of the population in some such form of distribution as 
this. 

There has also been a great deal of confusion concerning 
the mental ages which were assigned to the various letter 
grades. The following table gives these corresponding 
mental ages: 


TasBLE IX. Mentat AGEs CorRESPONDING TO THE LETTER 
Ratines in Army ALPHA 


Letter grades EandD— D C-— Cc C+ B A 
Corresponding 
mental ages 0-9.4 9.5-10.9 11-12.9 13-14.9 15-16.4 16.5-17.9 18- 


From the table of the distribution of the letter grades of 
the men in the principal sample given in Table VIII, it will 
be seen that the sum of the groups below grade C make a 
total of 47.9 per cent. Nearly 50 per cent of the men, in 
other words, were rated according to this scheme as below a 
mental age of thirteen years. A similar method of figuring 
gives an estimate of 40 per cent as being below a mental age 
of twelve years. Now it has been the practice of psycholo- 
gists to interpret a mental age of twelve years, when the 
individual is mentally mature, as representing marked dull- 
ness. Reckoning a normal adult as having a mental age of 
sixteen, one whose mental age was twelve would have an 
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1.Q. of 75 (42 X 100). If we refer back to Terman’s distribu- 
tion of I.Qs., we shall see that only between two and three 
per cent of children have an I.Q. as low as this. 

This enormous discrepancy between the percentage of 40 
for adults and two or three for children leads us to inquire 
how the equivalent mental ages were determined. They 
were got in this fashion: A carefully selected group of men 
were given the Army Alpha, and also the Stanford-Binet. 
The Stanford-Binet mental ages of these men were found. 
By a comparison of these mental ages with the Alpha scores 
of the same men, the mental ages which are equivalent to the 
various Alpha scores were calculated. This procedure 
assumes that scores made by children and by adults on the 
same mental test represent equivalent mental capacities. 
The results of the army test seem to give conclusive evidence 
that this assumption is not correct. While the discrepancy 
may be explained in part by other minor factors, the chief 
explanation must be this lack of equivalence of the results of 
the test given to children who are in school and are accus- 
tomed to doing tasks similar to those demanded by the tests, 
and to adults who have been out of school for from six to ten 
years or more, and have lost a good deal of their adeptness 
for performing tasks which involve clerical skill. It is un- 
safe, therefore, to interpret the mental age rating of adults, 
when they are obtained in this way, as meaning the same 
thing as they have been found to mean in our experience 
with children. 

The methods by which the tests were chosen for inclusion 
in the Alpha scale are instructive from several points of view. 
In the first place, it should be emphasized that the tests 
were selected on the basis of careful preliminary trials, and 
of a statistical tabulation and interpretation of the results 
of these trials. Each test that was included in the final 
scale was subjected to careful scrutiny. The correlation 
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technique which had previously been worked out and applied 
in the study of single tests was used constantly. 

The procedure was to select for preliminary trial a series 
of tests which had given evidence by previous experimental 
work of correlating well with general intelligence. These 
tests were made up into a preliminary scale, called Scale A. 
They consisted of the following tests: oral directions, memory 
span, disarranged sentences, arithmetic problems, information, 
opposities, practical judgment, number completion, analogies, 
and number comparison. Each of these tests was correlated 
individually with various other measures of intellectual 
capacity, such as officers’ ratings, scores in the Stanford- 
Binet scale, grade location in the school, and scores in other 
tests. At the beginning, the plan was to select tests which 
had high correlation with the outside criteria and low inter- 
correlation with one another. The reason for the plan to 
select tests which had a low intercorrelation was largely 
statistical. Such tests would not be measures of the same 
thing. A combination of tests with a low intercorrelation, 
but with a high correlation with the criterion, would from the 
purely statistical standpoint have a higher correlation with 
the criterion than a set of tests which measured the same 
thing and therefore correlated highly with each other. It 
turned out, however, that the psychological conditions were 
not in accordance with this statistical demand. The order 
of the tests as measured by their correlation with the cri- 
teria was almost identical with their order as measured by 
their intercorrelation. This is the same fact as was found by 
Burt. The detailed evidence of this statement will be pre- 
sented more fully in one of the chapters on technique. The 
evidence, then, seems to support the contention of Burt and 
of Spearman that those tests which are good measures of 
general capacity measure largely the same factor, or factors, 
of mental capacity. 
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5. The Army Scale Beta 


In order to provide an examination which could be given 
to illiterates and non-English-speaking men of foreign birth, 
the non-language group test Beta was devised. The scale 
consists of a series of eight tests printed on a paper folder. 
Each test consists of a series of pictures or drawings which 
may be understood by a person without the aid_of language. 
The directions are given by means of pantomime. The 
nature of the scale may be grasped from a brief description 
of the particular tests. 


Test 1 — Maze test. This test consists of a series of lines which 
form five mazes. In each case the examinee is required to draw a 
line by the shortest route from the left-hand side to the right-hand 
side of the maze, without going into any blind alleys. This test 
was suggested by an earlier one devised by Porteus.* 

Test 2 — Cube Analysis. This test consists of a series of draw- 
ings, each one of which represents a series of cubes piled upon one 
another in regular fashion. Some of the cubes are hidden from 
view and the examinee is required to tell how many cubes are in 
the pile. 

Test 3— XO Series. This test consists of a series of arrange- 
ments of the letters X and O. At the end of each series are a 
number of blanks which are to be filled out according to the same 
arrangement. 

Test 4— Digit Symbol. A substitution test similar to that used 
by Healy, Pyle, and others. 

Test 5— Number Checking. This test consists of a series of 
pairs of numbers, beginning with short ones and ending with long 
ones. The subject is required to check those pairs which are alike. 

Test 6 — Pictorial Completion. A series of pictures, each with 
one part left out, which is to be supplied by the examinee. 

Test 7 — Geometrical Construction. This is derived from the 
form-board test. It consists of a number of items. Each item con- 
tains a square and a number of figures which, when put together in 
the proper way, compose the square. The subject is to draw a 
line in the square to indicate how the figures might be arranged 
in it. 


18. D. Porteus. “Mental Tests for Feeble-Minded: A New Series”: in 
Journal of Psycho-Asthenics, vol. 19, pp. 200-13. 1915. 
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Since the Beta scale is the prototype of the group non- 
language scales, as Alpha is the prototype of the group 
language scales, it is reproduced in full. 

This scale was found to be reasonably satisfactory, al- 
though it did not give as accurate measurements as the 
Alpha scale. It was difficult to give the directions by pan- 
tomime, and variations in procedure were likely to occur. 
It has stimulated the development of a considerable num- 
ber of similar scales for application to school children, par- 
ticularly those for the primary grades in which children 
cannot read readily. With school children, of course, the 
handicap of inability to give directions orally does not exist. 


6. The performance scale examination 


In case the recruit had made a low score on the Alpha 
scale and the Beta scale, he was given one of the three 
individual examinations. In addition to the Stanford Re- 
vision and the Yerkes Point Scale, a performance scale 
was devised. This consists of tests which require the in- 
dividual to react to problems which are presented, not in the 
form of words, but of concrete objects. In some cases these 
objects are drawings, and in other cases they are composed 
of solid objects. The nature of the scale may be briefly de- 
scribed. . 


Test 1 — The Ship Test (Knox). This consists of a rectangular 
picture pasted on a thin board, and cut up into ten pieces. The 
pieces are to be arranged by the subject so as to make the picture. 
(See p. 116 for illustration.) 

Test 2 — Manikin (Pintner) and Feature Profile (Knox). These 
tests were derived from the series by Knox and Pintner already re- 
ferred to. They are-simple construction puzzles, one representing 
a face and the other a man. 

Test 3 — Cube Imitation (Knox). This is the Knox test already 
described. 

Test 4 — Cube Construction (Goddard). This test requires that 


’ 
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the subject shall put together small cubes painted on certain sur- 
faces in such a way as to make a larger block painted on certain of 
its surfaces. 

Test 5 — Form Board (Dearborn). This is somewhat similar in 
its make-up to construction puzzle B of the Healy-Fernald series. 
Both the original and the revised form used in the army were de- 
vised by Dearborn and his associates.! 

Test 6 — Designs (Terman). A series of figures are shown to the 
subject which he is to copy from memory as nearly as possible. 

‘Test 7 — The Digit-Symbol Test. The same test as was used in 
Beta. 

Test 8— The Maze (Porteus). These mazes are similar in 
principle to the ones used in Beta. 

Test 9 — Picture Arrangement (Bowler, Whipple). A series of 
“Foxy Grandpa”’ pictures placed out of order. They are to be 
placed in order so as to make the sequence. 

Test 10. — Picture Completion (Healy). Similar to the last pic- 
ture completion test of the Healy series. 


A third individual test, which was given in a few special 
instances to test mechanical ability, was the Stenquist 
Mechanical Skill Test. This test was described in the 


chapter on tests of special capacity. 


7. The uses of mental tests in the army 


The psychological committee planned to use mental tests 
primarily to detect drafted men who were too low-grade 
mentally to make satisfactory privates, to discover those 
who were mentally unstable and might prove incorrigible, 
and if possible to select exceptional men who might be used 
for tasks demanding a high degree of intelligence. ‘The uses 
to which the tests were actually put are classified briefly by 
Yoakum and Yerkes, as follows:? 


1 W. F. Dearborn, J. E. Anderson, and A. O. Christiansen. ‘Form Board 
and Construction Tests of Mental Ability”; in Journal of Educational 
Psychology, vol. 7, pp. 445-58. 1916. 

2 Yoakum and Yerkes, op. cit., pp. xii and xiii. 
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1. The assignment of an intelligence rating to every soldier on 
the basis of systematic examination. 

2. The designation and selection of men whose superior intel- 
ligence indicates the desirability of advancement or special 
assignment. 

3. The prompt selection and recommendation for development 
battalions of men who are so inferior intellectually as to be 
unsuited for regular military training. 

4. The provision of measurement of mental ability which ena- 
bled officers to build organizations of uniform mental strength 
or in accordance with definite specifications concerning in- 
telligence requirements. 

5. The selection of men for various types of military duty or for 
special assignment, as for example, the military training 
schools, colleges or technical schools. 

6. The provision of data for the formation of special training 
groups within the regiment or battery, in order that each man 
may receive instructions suited to his ability to learn. 

7. The early discovery and recommendation for elimination of 
men whose intelligence is so inferior that they cannot be used 
to advantage in any line of military service. 


The use of the tests as one of the means of selection of 
otficers is based upon the superiority of officers in the test 
ratings. The distribution of the scores of different groups 
of men is shown in Fig. 6. 

As the result of the experiments with the tests the follow- 
ing summary is given: During a specimen six months’ period, 
one half of one per cent were reported for discharge because 
of mental inferiority, six tenths of one per cent were recom- 
mended for assignment to labor battalions because of low- 
grade intelligence, and six tenths of one per cent were re- 
commended for assignment to development battalions, in 
order that they might be more carefully observed and given 
preliminary training. The purpose of this training was to 
discover means of giving the men training which would fit 
them to be useful soldiers. 'The army psychologists believed 
that there were nearly three per cent of the men who were 
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so low-grade mentally that they were not of sufficient service 
to compensate the Government for the expense necessary 
to equip and train them for service. Among the directions 
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Orricer MATERIAL 


From Yoakum and Yerkes, Army Mental Tests. Henry Holt & Co., 1920. By permission 
of the publishers. 


which were issued by the psychological service for the use of 
results of the psychological examinations, the following will 
throw additional light upon the application of these tests in 
the army: 

First, the tests were not designed to be a substitute for 
other methods of judging a man’s value to the service. They 
were not intended to measure character traits, such as 
“loyalty, bravery, power to command, or the emotional 
traits that make a man carry on.” Intelligence, however, 
was regarded as the most important single factor in effi- 
ciency. Second, it was expected that commissioned officers 
would be found chiefly among the men who received the 
grades of Aor B. Men with grades below C+ were expected 
rarely to have the capacity for success in officers’ training 
schools. Non-commissioned officers, furthermore, were 
expected to be chosen chiefly from the men whose grades 


were C+ or higher. 
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In selecting men for positions of special responsibility 
which corresponded to particular occupations of civil life, 
those men were first to be selected whose intelligence rating 
was above the average of men in that occupation. The 
intelligence ratings of the men in the army were classified 
according to their civil occupations. 
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From Yoakum and Yerkes, Army Mental Tests. Henry Holt & Co., 1920. By permission 
of the publishers. 


It was directed that men be assigned to permanent or- 
ganizations with a view to making these organizations equal 
in average intelligence.. The only exception to this was the 
case of ceytain arms of the service which were found to re- 
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quire greater than the average intelligence. Such arms were 
the signal corps, machine guns, field artillery, and engineers. 
The variation in the rating of men in different arms of the 
service is shown in Fig. 7. 

It is evident that the ability which is measured by the 
tests existed in different degrees in officers and men, and in 
the men of different branches of the service. The tests con- 
stituted, therefore, one of the means by which the men who 
were fitted for the successful performance of different 
functions in the army might be selected. 


CHAPTER VII 
SURVEY OF GROUP POINT SCALES 


In this chapter we shall discuss, first, the main facts con- 
cerning the recent development of the group point scales 
which are now available for use in the schools. We shall next 
consider the criteria which should be kept in mind in the 
choice of a scale to be used in the school. Finally, we shall 
present in tabular form the chief facts which are available 
concerning the chief existing group point scales. 


1. Recent development of group tests 

The army testing work bore fruit very rapidly in group 
point scales for use in schools and colleges. The War had 
not closed when Otis published his advanced examination. 
He had been working upon this test before the War opened, 
and published it in May to June, 1918. Within five years 
there have appeared approximately fifty such scales for 
schools and colleges. The Otis scale contains ten tests. 
It requires a full hour to give and is designed for the high 
school. It has had rather wide application, but is being 
displaced by other tests which do not require so much time 
and can more easily be given, among them Otis’s own Higher 
Examination. 

Beginning in 1917, Whipple began an elaborate study of 
the value of many of the various single tests which had been 
developed up to that time, for the purpose of selecting chil- 
dren for a special class for gifted pupils. At the completion 
of this study, which was reported in 1919, in his Classes 
for Gifted Children, he organized the tests which he found 
to be most suitable into a series of group tests. These con- 
stitute his Group Tests for Grammar Grades. 
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At about the same time as the Otis test the Group Point 
Seale for Measuring General Intelligence was published by 
S. L. and L. W. Pressey.' The Pressey test was also designed 
for use in the high school and has had rather wide applica- 
tion, particularly in surveys of the secondary schools of 
Indiana. It has sometimes been used for testing applicants 
for college entrance. 

In 1919 and 1920, respectively, there appeared, as a direct 
outgrowth of the army tests, the Haggerty Delta 1 and Delta 
2, and the National] Intelligence Test. Haggerty had been 
engaged with the psychological service in the army, al- 
though not in the psychological testing. He worked out the 
two above mentioned scales in connection with the Virginia 
School Survey. Delta 1, one exercise of which is reproduced 
in Fig. 8, recalls the Army Beta Test, which will be remem- 
bered as a non-language test. Delta 2 recalls Army Alpha. 
These scales were carefully adapted to the mental develop- 
ment of children in the primary grades and in the upper 
grades respectively. 

The National Intelligence Tests were worked out by a 
committee consisting of Haggerty, Terman, Thorndike, 
Whipple, and Yerkes. This committee was granted the 
sum of 25,000 by the General Education Board to conduct 
researches and to devise tests which should be more highly 
refined than was possible without such extensive investiga- 
tion. There are two scales, Scale A and Scale B, and two 
parallel forms of each scale. Other parallel forms are in 
process of development. The constituent tests are for the 
most part similar in character to those in Army Alpha. A 
feature which distinguishes these scales from mest others is 
that each test is preceded by a practice exercise. Partly on 
account of the prestige of the committee which organized the 


1 The references for these and the other group tests which are to be 
mentioned will be found in the table in the latter part of this chapter. 
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Fia. 8. InLusrrRatTion oF Part or A Non-LANGUAGE TEST FOR 
Prmary GRADES 


From The Haggerty Intelligence Examination, Delta 1. (Reproduced with the permission 
of the author.) 


tests, they shave had very wide application, and the norms 
which are furnished for them are based upon large numbers 
of children — about four thousand for each grade orage. It 
is not known, howeyer, whether the test is more valid or 
more reliable than other similar scales. 
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A group of scales which appeared in 1919 may be men- 
tioned because they possess the common characteristics of 
being in non-language form. The Myers Mental Measure 
is a brief test covering four pages, and designed to furnish 
intelligence ratings on individuals all the way from kinder- 
garten to the graduate school. It is therefore very steeply 
graded, each of the four tests contains some very easy and 
some very difficult items. The Pintner Non-Language Test 
is suited to the upper grades of the elementary school and is 
designed particularly for the purpose of testing children who 
are not familiar with the English language, either because of 
foreign extraction, or because of deafness or some other dis- 
ability. It is also designed to supplement the language 
scales. The Thorndike Non-Language Test was designed 
for adults and aims to provide a score which shall be based, 
not on language ability, but on ability to deal with problems 
presented in more concrete form. 

The Primary Tests, like the first one which was devised 
by Haggerty, are of the non-language type. This is, of 
course, necessary, since children in the primary grades either 
cannot read, or read so haltingly that a printed group test 
in the form of language would be almost entirely a test of 
reading ability. A number of excellent tests for the primary 
grades have been published on the same general model as the 
Haggerty Tests. It would be invidious to mention any of 
these especially on the ground of general merit, but two of 
them may be singled out because they contain somewhat 
novel features. The first of these is the Kingsbury Test. 
The Kingsbury Test contains four component parts. The 
first of these is a more or less conventional directions test. 
The other three were consciously designed to represent, in 
form of pictures, the mental processes which have been 
found to give the best results in general intelligence tests. 
The test which is illustrated in Fig. 9 is a completion test, 
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made after the analogy of the language completion test. 
Each figure contains a series of drawings, with blank spaces 
which are to be filled in by the child. For example, one of 
the easiest contains a series of circles of progressive size; the 
last space is left vacant and in it the child is to draw a circle 
larger than the one preceding. 

The other primary test which may be given special men- 
tion is the one devised by Dearborn. The general character 


FIBER 
et] 
eof foofm] Leto feletey 


Fic. 9. Intustrration or One Exercise or THE Kincassury Primary Test 


Designed especially to duplicate pictorially the situations set up in the most successful language tests. 


(Reproduced with the permission of the author.) 
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of the test is suggested by its title, Games and Picture Puzzles. 
The tests are evidently designed to be as interesting to the 
child as possible. The fact that he is taking a test is not 
emphasized, and the attempt is made to appeal to his interest 
rather than to the extraneous motive of making a high 
score. Another characteristic of this test is that it contains 
a somewhat greater variety of types of tests and is more 
elaborate and longer than some of the other scales. It con- 
tains some of the tests which have been derived from Army 
Beta, but contains also some invented by the author and not 
used in other scales. The test for the upper grades by the 
same author is of the same general nature as the one for the 
primary grades which has just been described. 

Group tests were first designed for individuals of the 
adolescent period or the adult. They were then designed 
for younger children of the upper grades of the elementary 
school, then for primary children, and finally scales have 
been devised particularly for children on entering school, or 
in the kindergarten period. The Cole-Vincent test is spe- 
cifically designed for children upon entering school. The 
Detroit Kindergarten Test, while not a group test, is so 
easily given that it consumes only a short period of time. 

All of the tests which have been mentioned thus far 
follow the organization of the army group test, that is, the 
scale is made up of a series of graded tests and all the similar 
items are segregated. For example, the arithmetic tests 
are all together, and so on. Furthermore, separate direc- 
tions are given for each of the individual tests of the scale 
and each one is timed separately. In 1919 Thurstone put 
out a test for high school graduates and college students 
which was designed on a different plan. ‘This scale was il- 
lustrated in Chapter I. It contains a variety of tests, but 
these tests are not segregated, but arranged in rotation, or in 
cycles. Easy examples of the various tests are placed at 
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the beginning of the series, then slightly harder examples, 
and so on. When a new variety of test is introduced, it is 
explained by an example. Thus the test is self-explanatory. 
The examiner gives the directions once for all at the begin- 
ning, and keeps time for the entire scale as a unit instead of 
timing the various parts separately. The Army Alpha 
Scale has been arranged in this spiral fashion and is called 
Scrambled Alpha. Otis has more recently adopted this form 
of organization in his Intermediate and Higher Examina- 
tions. 

Among tests for entrance to college, the most widely used 
are: The Thurstone test, Psychological Examination for 
College Freshmen and High School Seniors, Test IV; the 
Colvin test, Brown University Psychological Examination; 
and the Thorndike Intelligence Examination.! Of these the 
Thorndike test differs most widely from the others and also 
from tests which have been designed for other purposes. 
It is unique, in the first place, in its length. While most 
tests can be given in an hour or less, the Thorndike Examina- 
tion requires three hours. It is peculiar, in the next place, 
because it includes not only the usual general intelligence 
material, but also subject-matter drawn from high school 
subjects. This subject-matter is introduced in order that 
students who are bright but who have had poor preparation 
may not make an unduly high grade. This feature of the 
test makes the scale adapted only to the examination of 
students who have been through the high school. In ad- 
dition to the examination in the content of high school in- 
struction, the scale contains difficult reading tests. Another 
unusual feature of the examination is the multiplicity of the 


1 Since the above was written a college freshman test has been pre- 
pared for the American Council on Education by a committee under the 
chairmanship of Thurstone. This is a two hour test and is being widely 
used, 
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parallel forms which has been produced. There are fifteen 
or more forms of each of the parts of the tests now in exist- 
ence, and others will be prepared. The Thurstone test 
comes next to the Thorndike test in preparing a large num- 
ber of parallel forms. Most tests have but one or two. The 
purpose of the multiplicity of forms is to make it impossible 
for an individual to be coached up upon the test. Finally, 
this examination differs from most others in its emphasis 
upon quality of performance and upon endurance as dis- 
tinguished from speed and alertness. The quality of per- 
formance is measured by giving a liberal time allowance and 
arranging a series of steeply graded steps of difficulty. 
Endurance is tested, of course, by the length of the examina- 
tion. It seems very desirable to measure these character- 
istics. Whether the additional gain from the greater length 
of the test is enough to pay for the expenditure of time and 
effort in giving, taking, and scoring the test is a matter upon 
which all workers in the field are not in agreement. 

Some of the principles which are exemplified in the Thorn- 
dike Intelligence Examination are carried a step farther in 
the Roback Mentality Tests for Superior Adults. These 
principles are discussed in his article — “Subjective vs. Ob- 
jective Tests.”! The tests are more difficult than those of 
most scales, even for adults. They are more steeply graded, 
and the score depends less upon speed of performance. The 
response is not a choice among definite alternatives, but the 
subject is given free scope in his answer. ‘The scoring is 
therefore more a matter of judgment than in most scales. 
The author of the test believes, probably correctly, that a 
test possessing these characteristics is a better measure of 
superior capacity than is the usual type of group test. 

A scale which cannot be included in the following table 


1A. A. Roback. “Subjective vs. Objective Tests’; in Journal of Educa- 
tional Psychology, vol. 12, pp. 439-44, 1921. 
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because it is still in process of standardization, but which 
deserves mention on account of its notable character, is the 
International Test. This test is being worked out by Carl 
C. Brigham and Stuart C. Dodd in connection with the work 
of the Migrations Committee of the National Research 
Council. It is entirely non-language, including the instruc- 
tions, and the responses are all made by setting rotating 
pieces of cardboard so as to bring outline drawings into a 
given relationship with each other. The test promises to 
be most nearly universal and independent of training and 
cultural background of any thus far devised. 

A number of scales have been devised in which an effort is 
made to measure the relationship between capacity and 
performance. The first scale in which this measurement 
was attempted was the Illinois Examination, which was 
published in 1920. This examination consists of two parts. 
One part is made up as an ordinary intelligence examination, 
and the other part consists of an examination of two of the 
school subjects, arithmetic and reading. In scoring this 
test the two parts are kept distinct. The intelligence score 
is found in terms of mental age, and the achievement score 
in terms of achievement age. Mental age means the same 
thing as in any intelligence test. Achievement age is found 
by comparing the score which the pupil makes with a series 
of achievement norms. These achievement norms are the 
median scores made by the pupils of the various mental 
ages. (The usual practice is to use chronological ages.) 
After the mental age and the achievement age are found, 
the next step is to divide the achievement age by his mental 
age. This gives the achievement quotient. The achieve- 
ment quotient then is the ratio between what the pupil 
accomplishes and what it is assumed he could accomplish 
because of his intelligence rating. Other tests which are 
divided into two parts, the one an intelligence test, and the 
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other a subject-matter test, are the Mental-Educational 
Survey Test by Pintner, and the New Jersey Composite 
Test. 

The most recent test of this character is the Otis Classi- 
fication Test. The test is made up of two parts, the first 
part an Omnibus Achievement Test, and second an Omnibus 
Mental Test. The arrangement is the same as in the 
author’s self-administering test. . 

While it is, strictly speaking, only an achievement test 
and therefore does not belong in the discussion of mental 
tests, the Stanford Achievement Test may be mentioned here 
because it furnishes a composite measure of achievement 
and may be used in combination with a mental test in a simi- 
lar fashion to the Illinois Examination or the Pintner or 
Otis test. 

This brief survey has not been at all exhaustive, and many 
tests have not been mentioned which are perhaps as serv- 
iceable as those which have been singled out for special 
discussion. A selection of the tests which have been 
mentioned is based largely either upon their historical 
importance or upon the fact that they contain unique or 
uncommon features. 

We may turn next to a consideration of those character- 
istics which are important in determining the selection of a 
test. 


2. Criteria for the choice of a mental test 

Price. The price of mental tests varies greatly from one 
cent apiece at one extreme to one dollar at the other. 
Since the price of the test may be a determining factor in the 
decision whether a testing program may be launched upon 
or not, this is an important consideration. Furthermore, 
the value of the service of tests for a particular purpose does 
not always vary directly with price. The price range of the 
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majority of most of the group tests at the present time is 
from four to six cents apiece. The price of a test should be 
considered in one of two circumstances. First, in case two 
or more tests are equal in other respects but differ in price; 
or second, in case there is a definite limitation upon the 
appropriation for the testing program. However, in calcu- 
lating the cost of the entire administration of the test, other 
items must be considered, such as the time required to ad- 
minister or to score the test. These will be mentioned 
below. 

Completeness and convenience.of material, and fullness, 
simplicity, and clearness of directions. There has been such 
keen competition in recent years in the production of stand- 
ardized tests that the material and directions for the tests, 
including the materials necessary to score them easily and 
quickly, have come to be pretty thoroughly standardized. 
Unless there is some very strong reason to the contrary, it 
is by all means advisable to select a test for which the ma- 
terials and directions have thus been worked out. 

Adaptation to ages or grades to be tested. In the tabulated 
list of tests which is given below they are classified roughly 
according to the periods of school in which they are designed 
to be used. The tests, in general, may be classified as be- 
longing to one of five periods: pre-school, primary grade, 
upper grade, high school, and college. In some cases the 
pre-school and the primary period may be served by the 
same test, and in some cases the same test may be used in 
the high school and in the college. A few tests have been 
devised which are designed to cover a wider range in ages 
than is common. Such tests must either be made longer 
than the ordinary test, or they must have fewer items which 
are suited to the stage of development at a particular age. 
In the first case the test would be less reliable because based 
upon fewer numbers of items. In general it is probably de- 
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sirable to use tests which are adapted to a fairly narrow 
range. 

Appeal to the child. The typical modern intelligence test 
is very interesting to the child. Unless there is some special 
condition which makes the child nervous, he enjoys taking a 
test, particularly if he is at all accustomed to it. It is not 
at all difficult, therefore, to find tests which will be entirely 
satisfactory from this point of view for any stage of develop- 
ment. 

The content of the test. This refers to the subject-matter of 
the tests, or perhaps to the mental processes which they are 
supposed to measure. This feature is not a very important 
practical consideration in the choice of tests. As we shall 
see in the discussion of the content of the tests, in the chapter 
on technique, various tests have been shown to be of about 
equal value as constituents of an intelligence scale. Further- 
more, most of the scales use, at least in part, the same tests. 
We cannot, as it has sometimes been thought, determine just 
what specific mental capacity is measured by a particular 
test. In fact, it is probable that each of the tests measures 
a variety of mental capacities. 

The length of the scale. This characteristic was mentioned 
in the discussion of the Thorndike examination, and will be 
discussed from the technical point of view in Chapter X. 
The length of the scales varies considerably, from those 
which contain only four or five tests, and which can be 
given in fifteen or twenty minutes, to those which require 
from an hour to three hours to give. Theoretically, up to a 
certain point the increase in the length of a test adds to its 
reliability and therefore to its validity. This is because 
chance errors are diminished by an increase in the number of 
responses which the child makes. For example, if the test 
calls for the possession of a number of items of information, 
there is a chance that a given individual might possess or 
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fail to possess a given item accidentally. The crucial ques- 
tion is, What is the point at which we begin to have rapidly 
diminishing returns from an increase in the length of the 
test? Apart from the theoretical question, it may be neces- 
sary in some cases to select a short test for practical reasons. 
Again, the length of a test should be measured not so much 
by the total amount of time required to give the test, as by 
the amount of time the pupil actually spends in working 
upon it. A test which is organized on the omnibus or spiral 
plan is more economical of time than one which is made up 
of a series of segregated tests, each one of which has to be 
presented to the class individually. So far as length is a 
factor in reliability, the reliability of the test depends, not 
upon the amount of total time required to give it, but upon 
the amount of time the pupil actually spends upon it and the 
number of items of which it is composed. 

Ease and simplicity of administration. ‘This refers to the 
ease of preparing to give the test and of presenting it to the 
class. Tests differ in this respect, although most of the newer 
tests are very easy to prepare and toadminister. In general, 
the omnibus or self-administering tests are much simpler to 
give than the tests in which the items are segregated. 

Simplicity of response. The typical group test requires 
a very simple response. It may involve underlining a word 
or making a mark upon a drawing — rarely anything more 
complicated than this. In some cases, such as the comple- 
tion tests, it Involves writing a word in a blank. In the 
search for tests which should: demand only very simple 
response, psychologists have devised certain forms which 
meet these requirements. Among these are the yes and no 
tests, the multiple answer tests, completion tests, and the 
cross-out tests. In general, these tests have proved very 
useful; but whether all of them are successful in requiring 
thoughtful consideration on the part of the person being 
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examined is perhaps a question. The question is most 
pressing in reference to the yes and no, or right and wrong 
tests. It is commonly recognized that these tests tend to 
encourage guessing, since it is possible to obtain a correct 
score in half of the cases by guessing. In fact, the answers 
to these tests are commonly subjected to a correction by 
deducting from the number of correct answers a number 
equal to the number of wrong answers. Doubt has been 
cast upon this procedure by recent investigations, and it 
seems reasonable to say that tests or scales which contain 
tests of other sorts than the yes and no type of test are so 
far to be preferred. 

Ease and definiteness of scoring. The publishers of most 
of the prevailing group tests furnish with them stencils or 
other means by which scoring can be quickly and easily 
done. In fact, most of these tests can be scored as well by 
an accurate clerk as by a psychologist. The estimate of 
the amount of time required to score the test is given for 
those tests for which the information is available in the 
table below. This should be considered an important item, 
in addition to the price, in determining the cost of a testing 
program. 

Norms. Among the uses which may be made of the scores 
in a test is the comparison of the scores of an individual or 
of a group with a standard or a norm which has been estab- 
lished by giving the test to a large number of persons. The 
use of such norms often raises rather difficult questions of 
interpretation. For example, if a school is in a poor district, 
the scores of a majority of the children will probably be 
below the norm. ‘The question is, How shall this deficiency 
be interpreted? Furthermore, if the group is in general 
below or above the norm, and if we represent the individual’s 
score in terms of his relation to the norm, the distribution 
of the scores will be lopsided. For purposes of internal 


178 MENTAL TESTS 


administration, therefore, it is probably more useful to 
compare the scores of individuals with the averages or 
medians of their own group rather than with an outside 
norm. However, since norms are useful in some cases, the 
validity of the norm is a question to be taken into account. 

The validity of the norm, in general, depends upon the 
number of cases and upon their selection. The larger the 
number of cases the more stable will be the norm. The 
selection of cases should be such that the individuals do not 
belong predominately to one or another class of the popula- 
tion, but represent the different classes in the same pro- 
portion as they exist in the population as a whole. This 
may refer to location, to social level, or to age. In the case 
of age, it is very difficult to get properly selected cases for 
the ages in the adolescent period, because we usually test 
children in school, and those who remain in the school con- 
stitute only a part of the respective age groups. Further- 
more, those who have dropped out of school are usually not 
equal in intellectual ability to those who remain. The 
norms of a test, therefore, should be judged with reference 
to the number of cases which are used in deriving each norm, 
and with reference to the way in which the cases have been 
selected. In the chapter on technique we shall have to 
consider the relation between age norms and grade norms and 
the significance of these two types. 


The use of an appropriate relative or brightness score. We 
have already spoken of the methods of calculating bright- 
ness or relative ability by means of the intelligence quotient, 
or the coefficient of intelligence. Other scores have been 
used in connection with certain of the scales. For example, 
in his original group test, the Advanced Examination, Otis 
used a score which he calls the index of brightness, or I.B. 
He finds this by comparing the individual’s score with the 
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norm for his age, and then adding the difference between 
the score and the norm to one hundred, or subtracting it 
from one hundred, in case the difference is a plus or a minus 
difference. This gives a score which is similar in appearance 
to the I.Q., and has some relationship with it in meaning, 
but which is not identical with it. It may be seen from a 
casual inspection that there is a fundamental difference in 
principle between the assumption underlying the coefficient 
of intelligence, for example, and the index of brightness. In 
the case of the coefficient of intelligence, a difference in 
scores of the same amount would produce a different coeffi- 
cient for succeeding years, because the norm increases in 
amount. In the case of the index of brightness, on the other 
hand, a given deficiency or excess in score would produce the 
same index of brightness at different ages. An analysis 
would show also that the index of brightness involves a dif- 
ferent assumption in regard to the mental growth and the 
distribution of scores than is assumed by the intelligence 
quotient. The question which is necessary to raise here is 
whether the particular index which is used in a given test is 
justified by the nature of the distribution of the scores which 
are obtained from it. The literature which describes the 
test should give evidence that the author has considered 
carefully the psychological and statistical principles which 
underlie the form of scoring which he recommends. 

Directions for tabulating the results of a test. In some cases 
the manuals which go with tests present careful directions 
for plotting the distribution and calculating the various in- 
dividual scores which are derived from it. In some cases 
the directions for tabulating and for calculating the scores 
of individuals are presented in graphic form, which makes 
the calculation very rapid and easy. 

External criteria of the value of the test. The criteria which 
are here referred to are of a statistical sort, and are derived 
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from the application of the test and the tabulation of the 
results. The first criterion which is commonly applied has 
to do with the distribution of the scores. Ifa test has range 
of difficulty which is suitable to the different degrees of 
ability within the group which is to be tested, the scores will 
be distributed in fair conformity with the normal probability 
curve. That is, the greatest number or scores will be near 
the average, and the frequency of the scores will decrease 
at the same rate above and below the average. ‘The suit- 
ability of a test for the various ages or grades to which it is 
to be applied may be examined in part by tabulating the 
distribution of the scores, and examining them to see whether 
they approach the normal distribution. 

A second criterion concerns the progression of the average 
or median score for the successive age groups to which it is 
to be applied. The average should progress uniformly, at 
least throughout the ages below the adolescent period. In 
some cases the average has been found to progress at about 
the same rate up to middle adolescence, but in the majority 
of cases the increment at each age in this period is some- 
what less than it is in the preceding period. It is a reason- 
ably safe rule to consider a test unsuited for particular ages 
if the average score for the successive ages advances either 
much more rapidly or much less rapidly than for the other 
ages which are tested. 

The final statistical criterion, of course, is derived from the 
correlations of the test. As we have already seen, the test 
should show a high degree of correlation with itself when it 
is repeated, and a high degree of correlation with some out- 
side measure which is assumed to measure the trait in ques- 
tion. Most of the composite group tests which are on the 
market do not differ very greatly from one another in 
these respects, but it is at least a mark of the care in which 
the test has been worked out when the author furnishes the 
measures of reliability and of validity. 
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THE CHIEF GROUP POINT SCALES 


List of the chief group point scales in use, giving the title of the test, the author, publisher, 
and reference to published accounts of the test. 

1. Chapman Group Intelligence Examination without Prepared Blanks, J. 
Crosby Chapman. 

J. Crosby Chapman. “Intelligence Examination without Prepared 
Blanks”’; in Journal of Educational Research, vol. 6, pp. 777-86. 1920. 
Also Journal of Educational Research, vol. 11, pp. 269-79. 1925. 

2. Cole-Vincent. L. W. Cole and Leona Vincent. Buredu of Measure- 
ments, State Normal School, Emporia, Kan. 

L. W. Cole. “Prevention of the Lockstep in Schools”; in School and 
Society, p. 211. February 25, 1922. 

H. G, W. Frasier, formerly Director of Classification, Denver Public 
Schools, reports a comparison of results obtained by the Dearborn, the 
Detroit, and the Cole-Vincent test — 1000 children for each. Journal 
of Educational Research, vol. 6, no. 3, p. 269. October, 1922. 

3. Rhode Island Intelligence. Grace E. Bird and Clara E. Craig. Public 
School Publishing Co., Bloomington, Illinois. 
; Grace E. Bird. ‘‘ The Rhode Island Intelligence Test’’; in Journal of 
Educational Research, vol. 8, pp. 397-403. 1923. 
4. An Absolute Intelligence Scale. H. Woodrow and Grace Arthur. 

H. Woodrow and Grace Arthur. “An Absolute Intelligence Scale”’; 
in Journal of Applied Psychology, vol. 3, pp. 118-37. 1919. (This test 
is being revised by the authors, but is not yet published.) 

5. Pintner-Cunningham Primary Mental Test. R. Pintner and B. V. 
Cunningham. World Book Company. 

R. Pintner and B. V. Cunningham. “The Problem of Group Intel- 
ligence Tests for Very Young Children”; in Journal of Educational 
Psychology, vol. 13, pp. 465-72. 1922. 

6. Dearborn Group Tests of Intelligence, Series I and Series II. W. F. 
Dearborn. J. B. Lippincott Company. 

W. F. Dearborn and E. A. Lincoln. “How the Dearborn Intelligence 
Examination Standards were Obtained”; in Journal of Educational 
Psychology, vol. 13, pp. 295-297. 1922. 

W. F. Dearborn and E. A. Lincoln. “Revision of the Dearborn In- 
telligence Examinations”; in Journal of Educational Psychology, vol. 
14, pp. 39-46. 1923. 

7. Delta1. M. E. Haggerty. World Book Company. 

Virginia Public Schools Education Commission Report. Richmond 
Va., 1919. 

Rural School Survey of New York. Section on Educational Achieve- 
ment. 

8. Detroit First Grade Intelligence Test. Anna M. Engel. World Book 
Company. See Detroit Educational Bulletin, November, 1920. 
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9. Kingsbury Primary Group Intelligence Scale. F. A. Kingsbury. Public 
School Publishing Company. 

10. Myers’ Pantomime Group Intelligence Test. Garry C. Myers. Newson 
& Co. 

11. Otis Group Intelligence Scale, Primary Examination. Arthur S. Otis. 
World Book Company. . F 

12. Pressey Mental Survey Tests, Primer Scale. L. W. Pressey. Public 
School Publishing Company (Revised form; Primary Classification Test.) 

Luella W. Pressey. ‘‘A Group Scale of Intelligence for Use in 
the First Three Grades”; in Journal of Educational Research, vol. 1, 
pp. 285-94. 1920. 

Luella W. Pressey. “The Primary Classification Test”’; in Journal 
of Educational Psychology, vol. 9, pp. 305-14. 1924, 

13. Delta 2. M. E. Haggerty. World Book Company. 

M. E. Haggerty. “Intelligence Examination, Delta 2”; in Journal 
of Educational Psychology, vol. 14, pp. 257-77. 1923. 

14. Illinois Examination, I and II. Walter S. Monroe and B. R. Bucking- 
ham. Public School Publishing Company. 

W.S. Monroe. The Illinois Examination. University of Illinois, 
Bulletin 19, no. 9. 1921. (Bureau of Educational Research Bulletin 
no. 6.) Urbana, University of Illinois, 1921. 

15. National Intelligence Tests, Scales A and B. National Research Coun- 
cil Committee, Haggerty, Terman, Thorndike, Whipple, Yerkes. World 
Book Company. 

G. M. Whipple. ‘‘The National Intelligence Tests”’; in Journal of 
Educational Research, vol. 4, pp. 16-31. 1921. 

L. M. Terman and Edith D. Whitmire. “‘Age and Grade Norms for 
the National Intelligence Tests, Scales A and B”; in Journal of Educa- 
tional Research, vol. 3, pp. 124-32. 1921. 

16. Otis Intermediate Examination, Self-Administering Test. Arthur S. 
Otis, World Book Company. ; 

17. Pintner’s Non-Language Mental Tests. R. Pintner. College Book- 
Company, Columbus, Ohio. 

R. Pintner. “A Non-Language Group Intelligence Test”; in Jour- 
nal of Applied Psychology. vol. 8, pp. 199-214. 1919. 

18. Pressey’s Group Point Scale for Measuring General Intelligence. S. L. 
and L. M. Pressey. Indiana University, Department of Psychology. 

S. L. and L. M. Pressey. ‘‘A Group Point Scale for Measuring Gen- 
eral Intelligence, with First Results from 1100 School Children’’; in 
Journal of Applied Psychology, vol. 2, pp. 250-69. 1918. 

19. Pressey’s Mental Survey Scales “Cross-out” Tests, Schedule E. S.L. 
and L. M. Pressey. Indiana University, Department of Psychology. 

S. L. and L. M. Pressey. “‘Cross-out”’ Test, with Suggestions as to a 
Group Scale of the Emotions”; in Journal of Applied Psychology, vol. 
3, pp. 138-50, 1919, 
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S. L. and L. M. Pressey. “A Brief Group Scale of Intelligence for 
Use in School Surveys”’; in Journal of Educational Research, vol. 11, pp. 
89-100. 1920. 
20. Thorndike Standardized Group Examination of Intelligence Independent 
of Language. E. L. Thorndike. 
E. L. Thorndike. “A Standardized Group Examination of Intelli- 
gence Independent of Language”; in Journal of Applied Psychology, 
vol. 3, pp. 13-32. 1919. 
21. Whipple's Group Tests for Grammar Grades. G. M. Whipple. 
G. M. Whipple. Classes for Gifted Children, Public School Publish- 
ing Company, 1919. 
Helen Davis. “The Validity of the Whipple Group Test in the 
Fourth and Fifth Grades”’; in Journal of Educational Research, vol. 5, 
pp. 239-44. 1922. 
22. Morgan's Mental Test. John J. B. Morgan, Clio Press, Iowa City, Iowa. 
23. Army Alpha, Committee of American Psychological Association; Bu- 
reau of Educational Measurements and Standards, State Normal School, 
Emporia, Kansas; also Psychological Corporation, 3939 Grand Central 
Terminal, New York City. 
C.S. Yoakum and R. M. Yerkes. Army Mental Tests. Henry Holt 
& Co., 1920. 
R. M. Yerkes. Intelligence Examining in United States Army. 
Washington Academy of Sciences, 1921. 
24. Brown University Psychological Examination. StephenS. Colvin. J. B. 
Lippincott Company. 
25. Miller’s Mental Ability Test, W.S. Miller. World Book Company. 
26. Otis Group Intelligence Scale, Advanced Examination. Arthur S. Otis. 
World Book Company. 
A. S. Otis. “An Absolute Point Scale for the Measurement of 
Intelligence”; in Journal of Educational Psychology, vol. 9, pp. 239-61 
and 333-48. 1918. 
27. Roback Mentality Tests for Superior Adults. A. A. Roback. 
A. A. Roback. ‘Subjective vs. Objective Tests”; in Journal of 
Educational Psychology, vol. 12, pp. 439-44. 1921. 
28. Terman Group Test of Mental Ability. L. M. Terman. World Book 
Company. 
~ 29. Thorndike Intelligence Examination for High School Graduates. KE. L. 
Thorndike, Teachers College, Columbia University. 
E. L. Thorndike. “Intelligence Examinations for College Entrance”’; 
in Journal of Educational Research, vol. 1, pp. 329-37. 1920. 
Ben D. Wood. Measurement in Higher Education. World Book 
Company. 1923. Chaps. 3, 4, and 5. 
30. Thurstone Psychological Examination for College Freshmen and High 
Schocl Seniors. L. L. Thurstone, C. H. Stoelting. 
L. L. Thurstone. ‘A Cycle-Omnibus Test for College Students”’; in 
Journal of Educational Research, vol. 4, pp. 265-78. 1921. 
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L. L. Thurstone. “Intelligence Tests for Engineering Students”; in 
Engineering Education, February, 1923. 

31. Otis Classification Test. Arthur S. Otis. World Book Company. 

32. Pintner Mental-Educational Survey Test. (For the Mental Test, see the 
Non-Language Test above. The data for the Educational Test are given 
here.) R. Pintner. College Book Company, Columbus, Ohio. 

R. Pintner. ‘‘A Combined Mental-Educational Survey”; in Journal 
of Educational Psychology, vol. 12, pp. 32-43. 1921. 

33. Stanford Achievement Test, T. L. Kelley, G. M. Ruch, and L. M. Ter- 
man. World Book Company. 

34. Myers’ Mental Measure. Caroline E. Myers and Garry C. Myers. 
Newson & Co. 

Garry C. Myers. Measuring Minds. Newson & Co., 1920. 

35. The Mentimenters. M. R. Trabue. Doubleday, Page & Co. 

M. R. Trabue and F. P. Stockbridge. Measure Your Mind. Double- 
day Page & Co., 1920. 

36. Trabue Sentence Completion Scales. M. R.Trabue, Bureau of Publica- 
tions, Teachers College, Columbia University. 

M. R. Trabue. Completion Test Language Scales. Teachers College 
Publications, 1916. 


CHAPTER VIII 
TESTS OF PERSONALITY TRAITS 


INTELLIGENCE tests have made a large contribution to the 
analysis of the capacity of pupils to do school work. The 
scores on intelligence tests have a fair correlation with the 
achievement of pupils in their courses. The correlation is 
very far from perfect, however, and there is clear evidence 
that the comparatively low correlation is due, not simply to 
errors in the measurement of intelligence and of achieve- 
ment, but also to the presence of other factors in achieve- 
ment besides intelligence. An attempt to explain the 
discrepancy between intelligence and achievement in 
individual cases frequently brings convincing evidence that 
this discrepancy is due to some characteristic of the indi- 
vidual’s temperament rather than to his intellectual capac- 
ity. A complete measurement of the factors in school work, 
or in achievement in general, therefore, must include other 
traits besides intelligence. 

These other traits have been grouped loosely under the 
general head of personality. Personality is not a technical, 
psychological term, but it may serve for convenience to 
include a number of varieties of mental traits which are not 
intellectual and yet which depend, in some measure at least, 
upon the individual’s native or inherited make-up. Asin the 
case of intelligence, we may proceed in the attempt to 
measure these traits without settling the ultimate question 
of their origin. We need only assume that the traits which 
we are measuring are at least in considerable part due to 
nature as contrasted with nurture. 

A number of attempts have been made to classify the 
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personality traits, and there is large diversity in the group- 
ings which have resulted from these attempts. We cannot 
hope at the present stage of the measurement of personality 
to reach an entire agreement upon such a classification. In 
this case, perhaps the most useful procedure is to follow the 
classification which is suggested in the tests of personality 
traits themselves. A survey of the tests seems to indicate a 
natural grouping under four heads, as follows: will tempera- 
ment, emotional temperament, moral disposition, and 
zesthetic sensibility. All of these, it will be observed, are 
fairly distinct from the processes of perceiving, understand- 
ing, and thinking, which are classified as intellectual pro- 
cesses. 


1. Tests of will temperament 


Will temperament designates the characteristics of the 
individual’s overt reactions. Thus, a person may react to 
the stimuli of his surroundings energetically or weakly. He 
may, in general, react promptly or slowly. He may be 
persistent or vacillating. He may proceed cautiously and 
carefully or recklessly. His ideas may work themselves out 
into actions easily, or there may seem to be a blocking or 
obstruction which must be overcome before the action can 
take place. 

In order that we may test such traits as these they must of 
course exist as general characteristics, and not simply as 
particular forms of reaction to specific circumstances. A 
person must be one of quick decision or of slow decision in 
general, and not simply with a disposition to decide promptly 
or deliberately in one particular situation. It must be 
obvious that a person’s reactions are affected to some 
extent by the nature of the circumstances. A person may 
react very explosively toward another who is weaker than 
himself, and cautiously toward one whom he fears; but the 
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assumption is that, underlying these diversities due to the 
circumstances, there is a general trend which may be dis- 
covered by testing all individuals under the uniform condi- 
tions which are set up by a standardized test. 

The most elaborate test of will temperament is the one 
devised by June E. Downey.! _ This test was the outgrowth 
of a prolonged series of investigations of handwriting and of 
muscle reading. In the study of these forms of behavior 
Downey was impressed with large variations in the reactions 
of different individuals, and with the resemblance between 
their reactions in these specific forms of activity and their 
general conduct. She found handwriting a very convenient 
mode of behavior to test, and found that the behavior of 
individuals when they write, under a variety of conditions, 
gives a very good indication of their will temperament type. 
Handwriting, along with two or three additional forms of 
reaction, then, constitutes the subject-matter of the Downey 
test. 

Downey thinks of the will temperament as based, in the 
final analysis, chiefly upon two fundamental factors. The 
first of these is ““the amount of nervous energy at the dis- 
posal of the individual,”’ and the second is “the tendency of 
such nervous energy to discharge immediately into the motor 
areas and innervate the muscles and glands, or, on the 
contrary, to find a way out by a roundabout path of dis- 
charge.” 2 The individual’s behavior pattern, then, is due 
fundamentally to the fund of energy which he possesses, and 
to the openness or the blocking of the paths of discharge. 
There may exist various combinations of these two con- 
ditions. 


1 June E. Downey. The Will Temperament and Its Testing. New York: 
World Book Company, 1923. This book contains an account of the theory 
of will temperament testing, of the test itself, and of typical will profiles. 

2 The Will Temperament and Its Testing, p. 59. 
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A more detailed analysis of the will traits brings us to a 
division into three groups. These are “‘(1), those of speed 
and fluidity of reaction; (2), those of forcefulness and 
decisiveness of action; and (3), those of carefulness and 
persistence of reaction.” ! 

Speed and fluidity of reaction is measured in Downey’s test 
under four heads. We may take these up in turn and 
describe the tests by which they are measured. 

The first characteristic which is measured under this head 
is speed of movement. The test requires the subject to write 
the words “ United States of America”’ at his ordinary speed. 
This test assumes that individuals, if left to themselves, adopt 
a characteristic speed of movement, and that handwriting is 
a typical activity which fairly represents the individual’s 
general speed of movement. The second part of the assump- 
tion is subject to exception in the case of persons of special 
training or special lack of training, but it is thought to hold 
for most individuals who have an ordinary education. 

The second test is for freedom from load. It assumes that 
some persons habitually work near their maximum level of 
achievement, and that others are subject to a load or in- 
hibition which keeps their activities at a level below their 
maximum. The amount of load is measured by comparing 
the speed of ordinary writing with the speed of maximum 
writing. The individual is directed to write his name and 
the words “ United States of America”’ as rapidly as possible. 
The ratio is then found between the time of the normal 
writing and the time of the speeded writing. This ratio 
will, of course, ordinarily be greater than one. A low ratio 
of 104 to 115 indicates that the normal time is very little 
greater than the speeded time, and that the individual is 
characterized by marked freedom from load. A high ratio 
of from 150 to 220 indicates a great difference between the 

1June E. Downey. The Will Temperament and Its Testing, p. 62. 
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two rates of writing, and indicates a great load. Such an 
individual requires a strong stimulus to induce him to work 
as rapidly as he is capable of working. 

The third measure of fluidity of reaction is a test for 
flexibility. It consists in an attempt to disguise one’s writing 
of the words “United States of America.” Some persons 
are able readily to disguise their handwriting, either through 
the possession of a dramatic or histrionic type of tempera- 
ment or through the exercise of ingenuity. The amount of 
disguise is scored by comparison with a scale of specimens. 

The final test for fluidity of reaction measures the speed of 
decision. 'The individual is presented with a list of twenty- 
two pairs of opposite traits, and is asked to check the one of 
each pair which characterizes himself. If he prefers he may 
grade himself on the two traits instead of merely checking 
the one which is characteristic. Examples of the pairs are: 

careful, careless 
cautious, daring 


ambitious, unambitious 
punctual, tardy 


The purpose of the test is not at all to determine whether the 
person rates himself accurately, but only whether he decides 
promptly or deliberates long. Great differences between 
individuals in performing this simple task are found. The 
interesting feature of the situation is that an individual usu- 
ally finds good and sufficient reason for reacting as he does, 
whether rapidly or slowly. The individual does not realize 
that his mode of reaction is an expression of his individual 
temperament, and that another person may justify an 
entirely different type of reaction with as valid reasons as 
the one he gives himself. 

The second group of four traits measures forcefulness or 
decisiveness of reaction. A person who stands high in these 
traits may be described in a general way as an aggressive 
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individual. The first test measures what is called motor 
impulsion. This term designates the amount of energy 
which is behind one’s actions. One’s movement may be 
rapid or slow, vigorous or weak, and yet these differences 
may not represent accurately the amount of force behind 
the action. This is because the action may be inhibited or 
blocked in some fashion. The amount of motor impulsion is 
determined by setting the conditions so that the action will 
be free from inhibiting factors, or so that the action will be 
more spontaneous than usual. This is done by making it 
automatic; that is, by diverting the attention of the indi- 
vidual from what he is doing. The individual is required to 
write, first, with his eyes closed; second, while counting by 
threes with his eyes open, and again with his eyes closed; and 
third, by writing while he is counting the taps of a pencil by 
twos. If the size of the writing under these conditions is 
greater than one’s ordinary writing, a degree of motor impul- 
sion above the average is indicated. If the writing becomes 
smaller, a low degree of motor impulsion is indicated. In- 
crease in speed over the normal also indicates high motor 
impulsion, and decrease in speed a deficiency in motor im- 
pulsion. 

The second test of this group measures the reaction to 
contradiction. In the early part of the test period the 
individual is asked to make a purely arbitrary choice between 
two envelopes. The envelopes are then put aside while the 
other tests are given. In the present test he is asked to state 
which envelope he chose. The examiner then contradicts his 
statement. ‘The mode of his reaction to this contradiction is 
the basis of the rating. For example, if the subject throws 
the burden of proof upon the examiner, or suggests that 
the examiner is in error, or exhibits angry or suspicious 
behavior, he is scored ten in reaction to contradiction. If at 
the other extreme he makes some such remarks as “ You 
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fooled me that time,” or gives in when the examiner says, 
“Are you sure? I thought it was —” (naming the opposite 
envelope), or says that the envelope is forgotten, he is given 
the lowest score, namely one. 

The third trait of this group is resistance to opposition. 
This is measured by having the subject write blindfolded and 
then placing an obstacle in front of his pen and noting his 
reaction. Very strong resistance to opposition, for example, 
is represented by exerting strong pressure against the 
obstacle, maintaining the writing at its initial level by a 
firm, strong stroke, and usually with enlarged characters, 
the subject requiring no urging. The lowest grade is given 
to one who shows absolute passivity in spite of urging. A 
typical remark is, “I can’t,” or, “How can I when you stop 
me?” Here, again, the individual is sure to justify his re- 
action, of whatever type it may be, but the reaction is due, 
not to his judgment as to what he should do, but to his 
temperamental characteristic. 

The last trait of this group is finality of judgment. At the 
end of the examination the individual is given the pairs of 
traits which were presented to him at the beginning and 
asked to make any changes which he wishes in his rating of 
himself. The degree of finality of judgment is measured by 
the shortness of time which the individual requires for this 
rechecking. If he is very well satisfied with his original 
judgment he takes a short time. If, however, he has a dis- 
position to revise his judgment, he takes longer time. 

The last group of four traits represents carefulness and 
persistence of reaction. Capacity for inhibition, which may 
perhaps be regarded as the basis of control, is measured by 
requiring the individual to slow down his writing. He uses 
the same phrase as before, and is instructed to write it just as 
slowly as possible and still keep the pencil moving. He is 
told, ““Some people take thirty minutes to write the phrase. 
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Do not enlarge your writing.” For some persons this is a 
tremendously irritating task. They fly to pieces and find it 
apparently impossible to comply with the direction. A 
score of five is given to a person who can devote about two 
minutes to the task. A score of ten is given to one whose 
time is longer than eight minutes and fifty seconds, and a 
score of one to a person who cannot take more than twenty- 
six seconds. 

The second trait is interest in detail. The individual is 
asked to copy a particular specimen of handwriting, first, as 
exactly as possible, taking all the time he wishes, and second, 
without as great emphasis upon exactness and at the 
individual’s own natural speed. The degree of interest in 
detail is measured first by the accuracy of the imitation, and 
second by the excess in time taken in careful imitation with- 
out special instruction. 

The third test in this group measures codrdination of 
impulses. The individual is required to write rapidly the 
phrase “United States of America,” on a line about one 
and one quarter inches long. His score depends upon the 
degree to which the rapidity of the writing approximates 
the former speed of writing, and the completeness with which 
the individual keeps within the line. The successful indi- 
viduals are the ones who can keep in mind both the require- 
ment of speed and the limitation of extent. The ones who 
fail neglect either the one or the other of these two re- 
quirements. 

The last test is called volitional perseveration. In the test 
for flexibility the individual is directed to practice the 
disguise of his handwriting as long as he wishes on the back of 
the sheet. He is instructed “Take all the time you wish and 
do your best.”” The amount of time taken may vary from 
twenty-five seconds to fourteen and one half minutes. The 
time in this exercise is taken as a measure of one’s natural 
persistence. 
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The scores on the various parts of the will temperament 
tests are represented in the form of a profile. This profile 
shows at a glance the traits in which the individual scores 
high, and those in which he scores low, and enables one to 
judge of the general character of the individual’s will 
temperament. A specimen profile is shown in Fig. 10. It 
will be noticed that each of the tests is scored on a scale from 
zero to ten. f 
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The individual whose profile is before us is apparently low 
in speed and fluidity of reaction, represented in the first four 
traits. He is very slow in speed of movement and in speed of 
decision, is characterized by considerable load, and is not 
very flexible in reaction. The writer happens to be well 
acquainted with the individual and can testify that the 
record of the test in these respects is entirely correct. In 
two of the second group of traits the individual scores high 
and in twolow. There isasmall amount of motor impulsion, 
and not very vigorous reaction to contradiction. On the 
other hand, the individual pursues his course of action vigor- 
ously when he meets opposition, and holds rather tenaciously 
to his judgments when he has once made them. He may 
revise them when contradicted, but is not inclined to question 
them spontaneously. In the last group also, there are two 
high and two comparatively low records. Codérdination of 
impulses is the lowest of this group and volitional persevera- 
tion is the highest. Interest in detail is also relatively high, 
and the ability to inhibit reactions somewhat under the 
average. The high rating on volitional perseveration is 
certainly characteristic of this individual. This is probably 
his outstanding trait. Lack of good codrdination of impulses 
is not as evident. The profile, as given by the test, agrees 
very closely with one which was based upon the estimate 
of close acquaintances, the correlation being about .65. 

An isolated case, of course, is not sufficient basis for con- 
sidering the test valid. It is necessary, therefore, to 
examine the studies which have been made for the purpose of 
determining its degree of accuracy and validity. Downey 
herself (in Chapter XT) summarizes much of the work which 
has been done upon her test. We may notice in addition two 
sample studies. The first of these studies is by Meier.* 


Norman C. Meier. “Study of the Downey Test by the Method of 
Estimates”’; in Journal of Educational Psychology, vol. 14, pp. 385-95. 1923. 
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Meier gave the individual test to one hundred high school 
students. He collected, on as many of these students as 
possible, the judgment of three sets of judges on all of the 
traits which are measured in the test. These judges 
consisted of teachers, parents, and friends. The correlation 
between the ratings of the judges and the scores on the test 
was disappointingly low. They were all positive, but the 
highest was .24. This correlation was found inthe case of 
motor inhibition. The average correlation of the pooled 
estimates of the three groups of judges and the tests was .118. 
This lack of agreement may, of course, be due, not to the im- 
perfection of the tests, but to the unreliability of the judg- 
ments. This unreliability may be due to the inability of 
persons to rate such characteristics as are measured by the 
test, or it may be due to their failure to understand the 
categories which constitute the scheme. We are accustomed 
in daily life to judge people according to certain conventional 
categories. Those which are used in the scale represent a 
somewhat different type of analysis from that which prevails 
in popular thinking. Every effort was made in the study by 
Meier to give a clear and full definition of all the terms, 
but this effort may not have been wholly successful. 

A check upon the reliability of the judgments may be 
found in the correlation between the different sets of judges. 
This correlation we find also low. The highest average is 
between the judgment of the parents and that of the friends. 

It is also possible to measure the reliability of the test it- 
self. The most direct method of doing this would be to 
repeat the test. Instead of this Meier gave the group test to 
the same individuals to whom he gave the individual test. 
The group test differs in a number of particulars from the 
individual test. The correlation between the scores of the 
two tests was high in five of the traits, and low in seven. 
This discrepancy may be due in part to the difference between 
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the group test and the individual test, and in part to the fact 
that a number of the tests cannot be scored objectively but 
require the exercise of judgment. 

Buchanan made a study of the test by comparing the 
standing of a group of school children in the test with the 
judgment of teachers on the same traits... Buchanan 
calculated the correlations between certain of the teachers’ 
estimates themselves and between the estimates and the 
scores on the tests. He found that, in general, the correla- 
tions between the estimates of the three teachers was some- 
what higher than the correlation between the estimates and 
the test. He made the test in a lower sixth and an upper 
sixth grade. In the lower grade, six out of nine correlations 
between the tests and the estimates were negative. In the 
higher grade, four out of six were positive. A number of 
correlations between the teachers’ estimates themselves were 
rather high, but a number of them also were very low. 

The result of these studies, then, seems to be inconclusive, 
and indicates that refinement is necessary in both the tests 
and the estimates in order that the tests may be relied upon 
to the same extent as we rely upon our intelligence tests. 
Simplification in the methods of giving and of scoring the test 
is desirable. Whether revision of the classification and 
definition of the traits which are tested is desirable must be 
determined by further research. 

It is evident from the facts which have been presented that 
the Downey Will Temperament Test, which is the most care- 
fully standardized and most highly elaborated personality 
test which has yet been devised, is still unsuitable for wide- 
spread routine application in the school. It is still some- 
what in the experimental stage. It can be used profitably 


1W. D. Buchanan. A Study of Sixth-Grade Pupils through a Comparison 
of Teachers’ Estimates and Downey Group Test Scores. Unpublished Master’s 
Thesis in the Library of the University of Chicago, 1922. 
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only by those who are able and disposed to make a thorough 
study of the test and of the principles on which it is based, 
and to treat the results largely with suspended judgment. 
One who uses the test in this somewhat tentative fashion, 
however, may gain much insight into the personality traits of 
certain individuals. This is particularly true of individuals 
whose personality profiles are particularly marked — that 
is, those who exhibit strong contrast in the strength of 
the various traits and in whom one or more of the three 
groups of traits is conspicuously marked by strength or by 
weakness. 

In those cases which are illuminated by the test, one of 
several things may happen. One possible outcome, of value 
to the administrator and to the child himself, is the discovery 
of strong traits which have not been evident to casual 
observation. If a characteristic which has been previously 
suspected is confirmed by the test, the confidence of the 
teacher in his judgment is strengthened and he has a firmer 
basis for treatment. Even the confirmation of the existence 
of a suspected weakness may be of value. The existence of 
an individual native peculiarity is not to be regarded at all 
fatalistically. The weakness may be overcome in part in one 
or both of two ways. In the first place, it may be compen- 
sated for by an emphasis upon some other trait which is by 
nature stronger. Thus, a person who reacts slowly may 
compensate for this weakness by unusual accuracy. In 
addition to this it is possible by training to build up the re- 
action which is by nature weak. 

The use of the Downey test in the individual treatment of 
high school students is illustrated in an article by Reavis,! of 
the University of Chicago High School. Reavis has found 


1W. C. Reavis. ‘‘ Utilizing the Results of the Downey Individual Will- 
Temperament Test in Pupil Administration’”’; in School Review, vol. 33, pp. 
174-83. 1925. 
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that it is possible to use the results of this test in certain cases 
to bring to the consciousness of pupils a knowledge of their 
personal resources, and by this means to stimulate in them 
a more effective attack upon their work. The publication of 
this and of other reports of a like nature may serve to show in 
detail how the tests may be used to promote a useful analysis 
of personality, and possibly also to indicate or to suggest how 
the tests may be further developed. 

In addition to this rather detailed examination of the 
Downey test, we may briefly mention other will tests. The 
first is the so-called “volometer” by G. G. Fernald. The 
Fernald test does not, as does the Downey test, attempt to 
analyze the individual’s reaction and derive a profile from 
such an analysis, but aims simply to measure the degree of 
persistence and energy of the individual’s voluntary reaction. 
This is done, in brief, by recording the length of time which 
the individual can stand on his toes. A record is made by 
an instrument in which the individual’s movements are re- 
corded upon a dial at level with his eyes as he stands upon 
the platform. The hypothesis is that this test gives a 
measure of the individual’s general power of achievement, or 
will capacity. The author found that reform school boys 
were decidedly inferior in this performance to high school 
boys, and he concludes that this indicates a difference in 
will power which is responsible for the delinquency of the re- 
form school boys. In the interpretation of this test one must 
raise the question whether the difference which was found 
was due to a difference in achievement capacity, or whether 
it was also due to a difference in the social situations in which 
the individuals found themselves and in the stimulus to 
effort which the social situations gave. 

We may perhaps class under will tests also the Moore and 


1G. G. Fernald. “An Achievement Capacity Test”; in Journal of Educa- 
tional Psychology, vol. 3, pp. 331-36. 1912. 
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Gilliland tests.!. These authors tried out a number of tests 
and correlated them with the judgment of the aggressiveness 
of the individuals tested. The one which they found to give 
the highest correlation, and which they judged to constitute 
a fairly reliable test taken by itself, is the ability of the 
individual to look the examiner in the eye and at the same 
time carry on intellectual tasks, in this case performing 
arithmetic problems. We apparently have experimental 
evidence that this ability, which has long been regarded from 
general observation as being a measure of the individual’s 
social force, constitutes a good behavioristic measure of his 
social attitude. 

A modification of Muensterberg’s so-called “judgment 
test,” which he devised as a means of selecting ship captains, 
was made by Gibson and used as a test of decision. As 
revised, the test consists of twenty-four cards. The various 
cards have printed upon them the letters e, 7, y, and k in 
different frequencies. The individual is required to tell 
which letter occurs on a particular card with the greatest 
frequency.2, The speed and accuracy of the judgment is 
recorded. Only a comparison of the reactions of men and 
women is given in the article. It is found, for example, that 
the men were more rapid but less accurate in their judgment. 
Several types of decision reaction were distinguished — for 
example, the quick, ungraded, inaccurate type. Ungraded 
means that more time is not spent on the hard decisions than 
on the easy ones. The second type is that of the slow, 
graded, accurate person, and finally there is suggested a 
person who is frequently blocked in decision, who has oc- 
casional slow decision time. 


1H. T. Moore and A. R. Gilliland. “‘The Measurement of Aggressive- 
ness”; in Journal of Applied Psychology, vol. 5, pp. 97-118. 1921. 

2 Sybil M. Gibson. “A Decision Study of One Hundred Fifty Young 
Men and Women”; in Journal of Applied Psychology, vol. 4, p. 364. 1920. 
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2. Test of emotional tone, temperament, and interest 


Emotional temperament refers to subjective reactions in 
contrast to the overt reactions which fall under the head 
of volitional temperament. Again, without attempting a 
precise definition, we may designate what is meant by illus- 
trations. We pass judgment concerning emotional tempera- 
ment when we say that one person is characterized by a 
prevailing mood of depression, whereas another is subject 
chiefly to the mood of elation; when we say that one person 
is a confirmed optimist and another a pessimist; when we 
distinguish between an enthusiastic and an apathetic tem- 
perament. All these descriptions refer to the individual’s 
prevailing feeling tone. These feeling tones are, of course, 
related to forms of expression in conduct, but the feeling tone 
and the conduct may be distinguished, and it may be 
possible and profitable to test them separately. 

The emotions may be tested in two ways, the indirect and 
the direct. The direct method would involve arousing 
particular emotions, or putting a person in a situation in 
which various phases of temperament or various interests 
might be expressed, and then measuring his reaction under 
these conditions. The indirect method may be carried out 
by using words as stimuli and asking the individual to 
respond by words. This method may be carried out by 
investigating the reaction which the individual reports that 
he experiences when he thinks of the word. Another indirect 
method is to set up an imaginary situation and ask the 
individual to report what his reaction would be in the 
imagined case. It will be seen that the indirect method 
involves the use of some degree of introspection. That is, 
instead of observing the reaction of the person directly, he is 
asked to report what his own reaction is. 

All of the tests of emotion, temperament, and interest 
which have been worked out are of the indirect sort. This is 
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not to say that there have not been attempts in the laboratory 
to measure emotion. Such attempts have been made. The 
measurement of emotion, however, is not the same thing as 
the application of a test to determine the general or charac- 
teristic emotional tone of an individual which differentiates 
him from other individuals. Direct measurements have not 
yet been used to work out such a test as this. 

The most ambitious attempt to work out a test of the 
emotions is that which has been made by Pressey. He calls 
the test a group scale for investigating the emotions.! 
Pressey’s scale makes use pretty largely of experience with 
abnormal mental attitudes, and emphasizes the pathological 
emotional conditions. The four tests may be described as 
follows. The first test aims to discover various special types 
of unpleasant feeling. In the second test we find an adapted 
form of the free association test, which again seeks to un- 
cover pathological and criminological attitudes. The third 
test is an ethical discrimination test, and the fourth test is 
aimed to discover certain anxiety tendencies. The particu- 
lar character of these tests may be gathered from more 
detailed illustrations. 

In Test I the subject is instructed to cross out every word 
which is unpleasant. The first two lines are as follows: 

1. Disgust, fear, sex, suspicion, aunt 

2. Roar, divorce, dislike, sidewalk, wiggle 
An analysis of the words indicates that they are so chosen as to 
arouse different types of fear, or of unpleasant feeling. The 
types are represented in the first four words of the first list. 
Tn each list there is also a neutral word, which is put in as a 
joker to indicate whether the individual is following the 
instructions. In each succeeding line the types are repre- 

1§, L. Pressey. ‘A Group Scale for Investigating the Emotions”; in 


Journal of Abnormal Psychology, vol. 16, pp. 55-64. 1921. Reference to two 
earlier articles in this test by the same author are given in the article. 
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sented in the same order, except that they are dropped back 
one word in the list. In the first list the joker is at the end. 
In the second list, it is next to the end, and so on. The 
subject is next instructed to mark the one word in the list 
which is the most unpleasant. The test is scored in terms of 
the total number of words which are crossed out, which is an 
indication of one’s general emotionality, and the deviation in 
the words which are marked as being most unpleasant from 
those which are most frequently marked by people in general. 

Test II. This test consists of twenty-five lines of words 
such as the following: 

1. BLOSSOM, flame, flower, paralyze, red, sew 

2. LAMP, poor, headache, match, dog, light 
The subject is directed to cross out all of the words in small 
letters which are connected in his mind with the words in 
capitals at the beginning of the line. This is a free associa- 
tion test in group form. ‘The aim is to discover pathological 
trends of association. 

Test IIT gives lists of words representing different types of 
conduct. This is an adaptation of Fernald’s ethical dis- 
crimination test. Two of the lists are as follows: 

1. Begging, swearing, smoking, flirting, spitting 
2. Fear, hate, anger, jealousy, suspicion 

Test IV aims to discover anxiety tendencies. As in the 
first case, the subjects are told to cross out the names of all 
the things in each list about which they have ever worried. 
They are also told to draw a circle about the things in each 
list about which they have worried most. The first two 
lists are as follows: 

1. Injustice, noise, self-consciousness, discouragement, germs 

2. Clothes, conscience, heart-failure, poison, sleep 

The types of pathological attitudes which are represented 
in the first list are as follows. Paranoid or suspicion attitude 
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is represented by worry concerning injustice; the neurotic 
attitude by anxiety about nozse; the self-conscious or shut-in 
personality by anxiety concerning self-consciousness; mark- 
ing discouragement indicates a melancholic or self-accusatory 
attitude; and marking germs, the hypochondriacal attitude. 
This test, like all the others, is scored in terms of the total 
number crossed out, and also in terms of the peculiar choices 
which are indicated by the words which are in circles. 

The scores on all of the four tests are added, and this score 
is taken to indicate total emotionality. The deviations are 
then added and the total is taken to express idiosyncrasy in 
emotion. The author does not, however, emphasize merely 
these total scores, but emphasizes the desirability of making 
an analysis of the subject’s responses. 

This test is undoubtedly a promising beginning in the 
study or the measurement of emotional reaction. It has the 
advantage of being based on definite experiences which have 
been secured in the study of the insane and the neurotic. 
This same origin of the test, however, causes it to be some- 
what limited in scope. The next step in the development of 
tests of the emotion, following this study, would seem to be 
the examination of its validity and the attempt to extend it 
to include more largely variations among normal emotional 
attitudes. 

An attempt to test the prevailing attitude of the indi- 
vidual with respect to cheerfulness or depression is reported 
by Morgan, Mull, and Washburn.! In this brief report the 
authors indicate something of their method and of the 
results which they attained by its application. The pro- 
cedure was to present to the subject in succession fifty words. 


1 Eleanor Morgan, Helen K. Mull, and M. F. Washburn. “An Attempt 
to Test the Moods or Temperament of Cheerfulness and Depression by 
Directed Recall of Emotionally Toned Experiences”’; in American Journal 
of Psychology, vol. 30, pp. 302-04. 1919. 
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There were in all five series of fifty each. The subject was 
asked to think of the first experience which was suggested by 
a word, and then to note whether it was pleasant or un- 
pleasant. If the first experience was neither pleasant nor un- 
pleasant, the subject was to note the next one which occurred 
and so on, until one was suggested which had an emotional 
tone. The emotional tone was then recorded. The scoring 
was based upon the proportion of pleasant and unpleasant 
experiences which were thus recalled. One test of fifty words 
each was given on each of five successive days. The validity 
of the test was examined by asking associates to pass a judg- 
ment as to whether the individuals tested were prevailingly 
optimistic or pessimistic, or prevailingly cheerful or de- 
pressed. There appeared to be a fair degree of correlation 
between the results of the tests and these judgments. In 
general there was a greater disposition to judge the individual 
to be optimistic, and the test also indicated that more 
individuals recalled a majority of pleasant experiences than a 
majority of unpleasant experiences. 

A rather elaborate scheme for testing social attitudes and 
interests has been worked out by Hornell Hart.!. The test 
presents lists of objects or events, each item in the list being 
followed by a plus sign and a minus sign. The subject is 
requested to put a circle around the plus sign of all of those 
things which he likes, and to encircle the minus sign after all 
of those things which he dislikes. He is then to encircle 
twice the five things which he likes or dislikes most, and to 
underline the one of the five about which he feels most 
strongly. Each of the lists contains a variety of types of 
objects or events, which are supposed to represent different 
types of interest. The individual is scored according to the 
frequency of the various types of interest which he indicates. 

The stimuli and the responses are classified first into three 

1 Jowa Studies in Child Welfare, vol. 2, no. 4. 
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large groups. These represent, respectively: the sub-social 
interests, the unsocialized social interests, and the socialized 
interests. Under each of these large headings are from ten 
to fifteen more specific interests. Each of these interests 
again is represented’ by from five to nineteen specific choices. 
The choices, of course, are not classified in the test itself, but 
are classified only when the test is scored. A few examples 
from each of the three general groups may be given. 

Under the sub-social interests are interests in food. This 
is represented by interest in eating ice cream, in not having 
burned food, and in eating fried chicken. Another of the 
sub-social interests is love of physical activity, represented 
by liking to go skating and doing calisthenics. A third is 
love of music, which is represented by liking to go to concerts, 
reading musical criticisms, and community singing. Under 
the unsocialized social interests are pugnacity, which is repre- 
sented by liking to be in a good scrap, and to get ahead of 
your enemies; the desire for approval, which is represented 
by wishing to become famous; humor, which is represented 
by reading Life; and the repression of others, which is repre- 
sented by desiring stricter chaperons, prohibiting the smok- 
ing of tobacco, or suppression of immodest dancing. Finally, 
the socialized interests are illustrated by love of relatives, 
which is represented by wishing to attend a family reunion; 
by interest in organized religion, which is represented by 
liking to attend a religious revival or to read a certain paper; 
by patriotism, which is represented by wishing to serve one’s 
country or to prevent waste of public money; and by justice 
between racial and national groups, which is represented by 
wishing the abolition of lynching and desiring the improve- 
ment of the attitude of Japan toward China. ‘These, of 
course, are merely a few illustrations out of the many items 
of the test. 

The revised form of the test is based upon correlations 
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which were found between the characteristics of the indi-- 
viduals tested, as judged by their friends, and the way in 
which they marked the various items. While the test 
depends upon the individual’s report concerning his own 
interests, it would seem to have greater validity than a 
report upon questions of a general nature. If one should 
ask an individual, for example, whether he has sympathy, 
whether he is civic-minded, whether he is interested in the 
opposite sex, and so on, the reply would be liable to be 
subject to two large sources of error. In the first place, one 
would be influenced by his view regarding the desirability of 
these traits, and, in the second place, one would hardly be 
competent to judge, unless he were presented with a series 
of examples of the manifestation of his general interests, 
whether he really possessed such interests more than the 
average or not. If a person is asked, however, to make a 
large number of particularized judgments concerning his 
interests, he is likely not to see the general bearing of them, 
therefore not to be influenced by his general predilection for 
certain types of traits; and he is likely to be enabled to make 
a more accurate judgment because he can call to mind 
concrete situations and estimate his reaction in these 
situations. 

A similar method is reported by Shuttleworth ! for the 
measurement of ““money-mindedness.”” The subjects are 
asked to designate the degree in which they like or dislike 
a series of objects, activities or ideas, such as “ministry,” 
“literature,” “working-class solidarity,” or “farming.” 
The responses to some of the stimuli are found to be critical 
with reference to this particular character trait. 

A new technique in the study of individual differences in 
prevailing trends of interest is represented in an application 


‘Frank K. Shuttleworth. “A New Method of Measuring Character 
Traits’; in School and Society, vol. 19, pp. 679-82. 1924. 
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of the free association test by Jennie B. Wyman.! Two lists 
of 60 words each were selected in a preliminary experiment. 
The words were then given to a group of children, who were 
ranked by their teachers on the basis of the strength of their 
intellectual interest, social interest, and activity interest 
separately. Particular response words were then rated as 
indicating the one or the other type of interest, according as 
it was given frequently by the children who were ranked 
high in these interests. Thus, in the case of the stimulus 
word grand, the response word noble represents predomi- 
nantly intellectual interest, while the response word journey 
represents predominantly activity interest. 


3. Tests of moral attitude or judgment 


A third type of personality trait for which tests have been 
devised is the disposition which underlies the moral re- 
actions. Moral reactions may or may not be due to an 
innate disposition, but it is at least a tenable hypothesis that 
individuals differ among themselves inherently in their 
sensitiveness to moral distinctions and in their disposition to 
subject themselves to moral conventions. Apart from other 
temperamental or intellectual characteristics, for example, 
we seem, from observation and from a few preliminary 
experiments, to be justified in describing some persons as 
trustworthy and others as lacking in this desirable trait. 
What constitutes a good person will, so far as specific 
conduct is concerned, differ from age to age, and from 
nation to nation or from tribe to tribe, but conformity to 
the social code seems to rest upon sufficiently general traits 
that we may reasonably expect a person who exhibits this 
trait in one environment to exhibit it also in another. If 
this is the case, tests of moral disposition are possible. 


1 Reported by L. M. Terman. Genetic Studies of Genius, chap. Xvi. 
Stanford University Press, 1925. 
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There are two possible types of ethical tests, or moral 
tests. The one isa test of behavior. It involves placing the 
individual in a situation in which he is faced with a choice 
between performing a good action and performing a bad 
action. The individual’s response in such a situation con- 
stitutes the. basis upon which he is scored. In the second 
type of test the person is asked to pass a judgment regarding 
right or wrong conduct. He might be asked to state a gen- 
eral principle, or to pass a judgment concerning the relative 
merit of two acts or courses of conduct. 

It is evident that the first type of test is much more satis- 
factory than the second type if it can be set up. Very little 
experience of life is necessary to make one aware that a 
person may give lip service to a moral principle, but may 
repeatedly violate it in his conduct. One’s ability to state a 
principle, or to pass a correct judgment as to what should be 
done in a specific situation, therefore, is no guarantee that he 
would act in accordance with the right principle. It is 
difficult to set up a moral test, however, in which the 
individual shall be put in a situation which demands a moral 
choice. Because a test which demands verbal judgment is 
so much easier to administer and to devise, we may inquire 
further whether such a test may not have some value. Upon 
further consideration it appears that while a test which 
demands verbal response does not guarantee what the 
conduct of an individual shall be, it does give some infor- 
mation on the negative side of the case. If a person shows 
that he does not recognize a moral principle, we may be 
reasonably certain that he will not act in accordance with it. 
A merely verbal test may, after all then, be of some value. 

The first clear and definite attempt to test moral judgment 
was the Fernald Ethical Discrimination Test.! 


1G. G. Fernald. ‘The Defective-Delinquent Class Differentiating 
‘ests’; in American Journal of Insanity, vol. 68, pp. 524-94. 1912. 
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This test requires the individuals to rank ten misdeeds in 
order of their gravity. These deeds are as follows, with the 
letter designations which were used for them: 


E — To take two or three apples from another man’s orchard 

P —To take a cent from a blind man’s cup 

I — To break windows for fun 

C — To throw hot water on a cat, or in any way to cause it to 
suffer needlessly ° 

A — To break into a building to rob it 

N — To take money as “graft,” or “rake-off,” when you are a 
city or government official 

T — To try to kill yourself 

H— To ruin a nice girl and then leave her 

U — To set fire to a house with people in it 

S — To shoot to kill a man who runs away when you try to rob 
him 

The chief question which one naturally raises concerning 
a test of this sort is how far it is a measure of one’s natural 
moral attitude, and how far it measures merely the standards 
of conduct which characterize one’s environment which one 
has adopted through the teaching of the home, the street, or 
the school. It may to some degree he a measure of native 
individual differences, but it is probably to a much larger 
degree a measure of class differences which are a product of 
the environment. 

The most careful study which has been made of this test 
was made by Bronner in her study of delinquent girls.! 
Bronner compared the group of delinquent girls, mostly 
prostitutes, with three other groups; first, a group of college 
girls, second, a group of evening school girls, and third, a 
group of domestic servants. She tabulated the frequency 
with which the various offenses were ranked as minor offenses 


1 Augusta F. Bronner. A Comparative Study of the Intelligence of Delin- 
quent Girls. Teachers College Contributions to Education, no. 68. New 
York, 1914, 
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or as major offenses. The chief differences which she found 
are between the college group and the other three groups. 
The difference between the delinquent girls, the evening 
school girls, and the servants is negligible. This confirms 
the supposition that the test is largely a measure of environ- 
mental conditions. The differences between the college 
group and the others occur chiefly in offenses P, C and N. 
The other three groups rate the offenses of taking a cent 
from a blind man’s cup and throwing hot water on a cat as 
serious offenses, and on the other hand, taking money as 
graft as a minor offense. All the groups considered the last 
four offenses as serious. 

A second test of moral discrimination which Bronner used 
was the Adapted Completion Test. This test consists of a 
series of passages with blanks left to be filled in by the 
examinee. Certain of these blanks give opportunity for the 
expression of moral judgment. For example, the first 
passage reads thus: 


Mary likes pretty clothes very She a man who 
offered to give her a new suit if she would go out with him. She 
was to do this and to go in this way to the theater. 
That was 


The manner in which the last blank was filled in, in con- 
junction with completion of the blanks in the previous 
sentence, expresses a moral judgment. Eight passages of 
this nature were considered as giving fairly satisfactory 
results. In the case of some of the blanks the responses 
which were made by some of the delinquent girls gave 
evidence of the effect of their experiences. In other cases 
the differences between the four groups were not large 
enough to form the basis of a differentiation. This test, like 
the Ethical Discrimination Test, is probably more a meas- 
ure of environmental training than of innate character. 
Another test, which is primarily a measure of what a 
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person thinks about moral questions or acts rather than what 
he does, was included by Pressey in his mental survey scale.! 
Pressey presents a series of words and asks the subjects to 
classify them under one of the four categories. He gives an 
illustration of the meaning of these categories. They are as 
follows: 

No. 
No. 
No. 
No. 


. Something which must not be done in school. 

. Something which means being good to other’ people. 

. Something which will hurt one’s self. 

. Something which means getting something unfairly from 
some one else. 


H= 09 2 


The words are as follows: 


Whispering, stealing, charity, drunkenness, passing notes, kind- 
ness, sickness, gambling, tardiness, friendliness, extravagance, 
swindling, being noisy, benevolent, disobedience, forgery, justice, 
dissipation, graft, gluttony. 


It is a question whether this is not more an intelligence 
test than a moral judgment test. 

Another test which aims to measure ethical discrimination 
by means of verbal response, is the one devised by Liao.? 
This test is in the form of a “best reason” test. There are 
presented a number of statements, and following them a 
number of reasons why these statements are correct. One 
of each of these sets of reasons is supposed to involve a moral 
judgment, and if the individual checks this one he is supposed 
to exhibit moral perception. For example, No. 1 reads as 
follows: 


It is wrong not to work. 
1. Idle people are called lazy. 


1S. L. Pressey. ‘‘Cross-out Tests’; in Journal of Applied Psychology, 
vol. 2, p. 257 (1918), and vol. 4, pp. 97-104. 1920. 

2S. S. Colvin. “Principles Underlying the Construction and Use of 
Intelligence Tests, in Intelligence Tests and their Use”; in Twenty-First 
Yearbook: of the National Society for the Study of Education, pp. 40-41. Public 
School Publishing Company, 1922. 
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2. Idle people earn no money. 

3. Idle people are discontented. 

4. Idle people live on the works of others. 
5. Good men tell us we should work. 


No. 4 is the reason which is given credit for correct response. 
Item No. 12 is as follows: 


To eat more than one needs is wrong. 
. It deprives others of what they need. 
. The government urges us to save food. 
. Food is expensive. 
. Overeating injures our health. 
. It may make us gluttons. 
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Another effort to measure the moral attitudes by a test of 
the ideational content, or the ideational background of moral 
concepts, was made by Brotemarkle.t Brotemarkle has 
classified the moral judgments under seven general heads. 
For each of these classes he has selected two pairs of words 
which represent the opposite extremes. Between each of 
these pairs are placed seven words which are to be ranked by 
the subject in order, from one extreme to the other. For 
example, the first list is as follows: 


(Good) pure, kind, considerate, fair, mean, bad, wicked (Evil) 


The seven words represent the transition from the extreme 
of good to the extreme of evil. This series is supposed to 
represent the basic moral principle. The other pairs of 
extremes are as follows: 


purify, corrupt 
courageous, cowardly 
modest, bold 
truthful, lying 
hatred, love 
ambitious, indifferent 


1A. R. Brotemarkle. “A Comparison Test for Investigating the Idea- 
tional Content of a Moral Concept”; in Journal of Applied Psychology, vol. 
6, pp. 235-42, 1922. 
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The test shows a fair correlation with the Pressey test of 
moral judgment and the test of emotions, and a high cor- 
relation with general intelligence tests and college grades. 
The question, therefore, is whether the test is not, primarily, 
one of intelligence, rather than of moral attitude. 

The first set of tests which depends on behavior is that 
devised and used by Voelker. Voelker’s tests were designed 
to measure trustworthiness. They set up situations in 
which the individual has an opportunity to act in an un- 
trustworthy manner. The moral character of the test was 
disguised and the attempt was made to determine what the 
reaction of the individual would be when he did not realize 
that he was under scrutiny. The tests were represented to 
be mental tests, or tests of mental ability. 

Voelker used two series of tests. The first series was 
given to several groups of boys at the beginning of a period 
of seven weeks, and the second series to the same groups of 
boys at the end of this period. Two of these groups had re- 
ceived special training and instruction in trustworthiness as 
members of Boy Scout troops. The second series was aimed 
to measure the same traits as the first, but were somewhat 
modified in form. The following is the list of the tests: 


Sertss I. 

1. Overstatement. Among other questions the boy is asked if he 
received 95 in his arithmetic. 

2. The M and N suggestibility test. This test was given as a 
group test and is adapted from one of the tests of the Downey 
Individual Will Temperament Tests. It is aimed to deter- 
mine whether the boy will stick to his remembrance of a 
simple fact in the face of a contrary suggestion. 

3. The “‘ Let-me-help-you”’ test. This is given as a group test. 
The child is shown some puzzles and is told to work them 


1Paul F. Voelker. The Function of Ideals and Attitudes in Social Edu- 
cation. Teachers College Contributions to Education, no. 112. New York, 
1921. 
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without help. An examiner comes in later with some ad- 
ditional puzzles and offers to help the children on the first one. 

4. The borrowing-errand test. The child is instructed to borrow 

something and is told by the lender to have it returned 

promptly. His score is based on the faithfulness with which 
he carries out this instruction. 

The purchasing-errand test. 'The child is given over-change 

to see whether he will return it or keep it for himself. 

The tip test. The child is given a tip for some trifling favor. 

The push-button test. The boy is told to push an electric 

button every two minutes by the clock for a certain number 

of times. The faithfulness with which he does this is kept 
account of. 

8. The crossing ‘a’ test. This is the first part of a group test. 
The individual is set to crossing out a’s in two types of 
material, one uninteresting and one interesting. 

. The Pintner profile test. A group test. The boy is directed 
to perform this test with his eyes shut. It can only be per- 
formed by chance without looking at it. The score is based 
upon the proportion of times which the examinee reports that 
he has done the test correctly. 

10. A group tracing and opposites test. On the first sheet of a 
folder, a boy writes the opposites of a list of words. His 
responses are recorded without his knowledge on wax paper 
underneath. Later he is put in a situation in which it is 
possible for him to make correction in his responses and thus 
unfairly raise his score. 


ot 


SS 


ve) 


Serres IT. 

1. Overstatement. Similar to Series I, No. 1, except that different 
statements are used. 

2. Truthfulness and suggestibility. Aims to determine whether 
the individual checks up and contradicts a false statement 
when he has the facts at his disposal. 

3. Receiving help. This is a group test in which a series of pro- 
blems are presented on the first page of a booklet, and answers 
are given on the back. Some of these answers are wrong so 
that if the individual copies them, he can be detected. 

4. Reliability. The boy is directed to deliver a letter and see 
that it is answered. 

5. Honesty. A letter is mailed to the boy containing twenty-five 
cents, which was obviously sent by mistake. 
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. Taking a tip. 

. The push-button test. 

ele = @e test: 

. The cardboard test. This is similar to the profile test in its aim. 
A cardboard containing five circles is presented and the sub- 
ject is instructed to touch them in turn with his eyes shut. 
If he reports that he has touched them all he is marked as 
having failed. 

10. A completion test. Similar in nature to the tracing and op- 

posite test of Series I. 


Node a mo) 


The evidence which is presented in Voelker’s study seems 
to indicate that the tests have some degree of validity as 
measures of moral reaction. The groups were judged as to 
their trustworthiness by teachers and other adults who were 
acquainted with them. The correlations were found between 
the ratings of these judges, and also between the judges’ rat- 
ings and the test. In some cases the correlations were low, 
but they were in all cases positive. Correlations between 
the ratings of the judges and the tests were sufficiently high 
to indicate a considerable agreement between them. 

The other type of evidence concerns the relative gains 
made by the two experimental groups who were given special 
instruction and training, and two other groups who were 
used as control groups. The experimental groups made a 
higher score in the second test than they did in the first by 
13.5 per cent and 9.9 per cent, respectively. The two control 
groups made a lower score by 7.6 per cent and 10.2 per cent, 
respectively. The actual loss in scores of the-control groups 
is attributed to the greater difficulty of the second series. 

A modification of three of Voelker’s tests, an adaptation 
of Pressey’s test of moral judgment, and a modification of 
Woodworth’s questionnaire for the discovery of psycho- 
pathic tendencies were tried out by Cady ! comparatively on 


1Vernon M. Cady. The Estimation of Juvenile Incorrigibility, Journal of 
Delinquency, Monograph no. 2. Whittier State School, 1923. 
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corrigible and incorrigible boys. He found these tests some- 
what diagnostic of incorrigibility. 

Seven somewhat similar tests were worked out by 
Raubenheimer ! with the especial purpose of comparing the 
moral reactions of gifted children with those of children of 
ordinary intelligence. They are as follows: 


. Overstatement A. (Modified from Voelker.) 

. Overstatement B. (Modified from Voelker.) 

. Questionable reading preferences. 

. Questionable character preferences. 

. Social attitudes. A multiple answer test designed to bring 
out the attitude toward chums, boy scouts, teachers, ete. 

. Activity preferences — similar to 5. 

. Rating the seriousness of offenses. 
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Gifted children were found to be superior in these tests. 
The behavior tests appear on the whole to be more satis- 
factory than the judgment tests. They at least merit 
further investigation. The tests should be simplified in 
administration and in scoring. It is possible that tests of 
similar general nature could be devised to measure other 
aspects of moral reaction than the one of trustworthiness. 


4. Tests of esthetic sensibility 


Finally, we may distinguish among the personality traits 
zesthetic sensibility. There is, perhaps, greater reason to 
doubt that «esthetic susceptibility exists as a general trait 
than in the case of the other three types of personality 
traits. It may well be argued that a person may have 
delicate susceptibility to degrees of merit in painting and not 
in music. He may appreciate beauty in literature and not 
in architecture. This raises one of the problems to be 
investigated. If it is possible to devise tests which will 


1 The investigation is summarized by L. M. Terman, in Genetic Studies of 
Genius, chap. xvi. Stanford University Press, 1925. 
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measure susceptibility in each of the special fields of zesthetic 
appreciation, it will then be possible to determine whether 
this trait is a general one or whether it is made up of a 
number of specific elements. 

The tests of zesthetic appreciation are very meager. The 
author has only two to report which fall under the definition 
of tests which has been adopted in this book. The first of 
these was devised by Thorndike, and aims to’measure the 
appreciation of form and design and the appreciation of 
poetry.!. The tests of form and design require the indi- 
vidual: (1) to rank in order of their beauty a series of rec- 
tangles which differ in width but are equal in length; (2) 
similarly to rank another series of rectangles; (3) to rank a 
series of crosses which have the upright the same length and 
the cross-bar differing in length and in position; (4) to rank a 
series of crosses with the cross-bar the same length but in 
different positions; (5), (6), and (7) to rank a series of 
designs, consisting of pairs of upright lines, either between 
horizontal lines or in rectangles. The correct ranking of each 
of the figures is based upon the statistics of the judgments 
of college students. 

The test of appreciation of the quality of a line of poetry is 
given as follows: The individual is presented with a line of 
verse and with a number of other lines, each one of which 
might be used to complete the couplet. He is to judge 
which of them completes it the best. An example is the 
following. 


First line: But still he only saw and did not share. 
Completion lines: 

He could not tell us what he did there. 
Though longing to join her did not dare. 

He merely felt, but did not care. 


1 Edward L. Thorndike. ‘‘ Tests of A’sthetic Appreciation”’; in Journal of 
Educational Psychology, vol. 7, pp. 509-22. 1916. 
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He feared his virgin deed to do and dare. 


The other’s pleasure, nor did he care. 
His saddened heart, still wand’ring was not there. 
Her varied toil, her deep and heavy care. 


These completed lines are placed in the order from poorest to 
best on the basis of a ranking by sixty adults. 

The other «xsthetic test is designed to measure the 
appreciation of pictures. Thirty-six Cosmos prints were 
used. The standard ranking of the pictures was first de- 
termined on the basis of the judgment of three experts in the 
department of art. They were then given to 144 women 
students to rank. The correlation of the entire group with 
the rankings of the experts was .33. The students were then 
classified according to their interest and training im art. 
The group which had the most training and interest ranked. 
the pictures in an order which gave a correlation of .49. 
The correlation in the case of the group which had interest 
but no training was .43 and of the group which had no 
interest or training —.11. The individual correlations varied 
from .82 to —.42. The placing of some of the pictures ap- 
peared to be particularly significant. For example, the 
correlation between the rank which the different individuals 
assign to “Young Lady with a Pearl Necklace” and the 
standing of the individuals was .78. This apparently 
indicates the importance of the appreciation of technique, as 
distinguished from the narrative interest, in determining the 
conventional artistic judgment. 

The tests in the field of non-intellectual attitude and func- 
tions are still chiefly in the realm of experimental develop- 
ment. A few of them are well enough worked out to be tried 
tentatively in the field of practical application. The inter- 


1 J. Cattell, J. Glascock, and M. F. Washburn. ‘‘ Experiments on a Pos- 
sible Test of Asthetic Judgment of Pictures”; in American Journal of 
Psychology, vol. 29, pp. 333-36. 1918. 
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pretation of their results, however, should be made with 
great caution. Enough has been done to indicate that we 
may be able ultimately to test the non-intellectual functions 
as well as we now can test the intellectual ones. The pur- 
pose of this chapter has been to point out the directions in 
which the experiments in the measurement of personality 
traits are going, to indicate the principles which underlie 
testing as applied to this field, and to illustrate by means of 
concrete examples the kinds of technique which are being 
tried out. 


CHAPTER IX 
TECHNIQUE AND THEORY OF MENTAL TESTS 
I. SUBJECT-MATTER OF TESTS AND RELATED PROBLEMS 


Tue principles of technique and theory which are discussed 
in the present chapter will be taken up in connection with 
the practical problems in which they arise. These practical 
problems are met with in the two situations of designing 
mental tests, on the one hand, and administering them, on 
the other. Most of the theoretical and technical questions 
concerning mental tests arise in either one or both of these 
two situations. Some of them also arise, it is true, when we 
attempt to interpret the results of mental tests. There is 
some overlapping between the questions which come up in 
the design and administration of tests and in their interpre- 
tation. In the present chapter we shall deal with these 
questions primarily from the point of view of design and 
administration. In a later chapter we shall approach the 
problems from the point of view of interpretation. In this 
later discussion it will be possible to assume familiarity with 
the treatment of the problem which we shall make in the 
present chapter. 

The approach to the technical and theoretical problems 
from the point of view of design and administration does not 
mean that one may not be concerned with these problems 
except as he intends to design a mental test, or even to ad- 
minister it. The subject is approached in this way primarily 
because it constitutes a convenient method of organizing the 
questions which will be taken up. In addition to this ad- 
vantage, this mode of approach will be serviceable in case 
one wishes to design a test, or to examine a test from the 
point of view of its conformity to the technical requirements. 
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1. Selection of subject-matter 


The first problem which we meet either in designing or 
judging a mental test is concerned with subject-matter. By 
subject-matter we mean the content of the test. We may 
think of the content from one of three points of view. We 
may either think of the material of which a test is composed, 
such, for example, as a list of words to which opposites are 
found, or a list of words which must be defined, or a list of 
arithmetic problems which must be solved, and soon. On 
the other hand, we may think of subject-matter from the 
point of view of mental process, or mental capacity, which 
the test is designed to measure. Thus we may think of a 
test as measuring memory or discrimination between the 
pitch of sound or the tones of color, or the ability to associate, 
or the ability to reason, etc. Or, in the third place, we may 
designate the test, not in terms of the objective material of 
which it is composed, or of the mental process which it is 
supposed to measure, but of the operations which this 
individual goes through in attempting to pass the test. This 
point of view may be regarded as intermediate between the 
other two. Whether we define the subject-matter in one or 
the other of these three ways, the problem is the same -— 
What is the best subject-matter for a given purpose? 

We have already met this question repeatedly in our 
historical account of the development of tests, and in our 
description of the various types of tests which are in common 
use. The function of our present discussion will be to bring 
together in an organized whole the problems which are re- 
lated to the subject and which have been touched upon in a 
piecemeal way in the previous chapters. At the outset of 
our discussion of subject-matter we must draw a general 
distinction. This is the distinction between a test of a 
special mental process or of special capacity, on the one 
hand, and a test of general capacity, on the other. The 
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subject-matter of the test will be determined, in the first 
place, according as our purpose is to measure special capac- 
ity or general capacity. This is true, even though we should 
come to the final conclusion, as some do, that a test of 
general capacity is merely a collection of tests of special 
capacity. Even if this assumption is true, it is still necessary 
to determine what collection of special capacities is satis- 
factory as a measure of general intelligence or general 
capacity. It is necessary to make the distinction, also, even 
though it might turn out that certain general capacities are 
to be identified with particular special capacities. This 
seems, in fact, to be the theory which is held by some. On 
this hypothesis it is necessary to determine which of the 
special capacities have a general significance. Let us 
assume for the moment that it is possible to make this dis- 
tinction, and return at a later point to the question of what 
general capacity is, and how it may be measured. | 


2. Subject-matter in tests of special capacity 


It has already been pointed out that the early tests were 
designed chiefly to measure a variety of special capacities. 
During the development of tests the emphasis has shifted 
more and more to the measure of general capacity or general 
intelligence. In spite of this shift in emphasis, it is still 
necessary for certain purposes to measure special traits as 
distinguished from general ones. This is particularly true in 
tests for vocational selection and guidance. Not all voca- 
tional tests are of the specialized sort, but some of them 
clearly are. Perhaps the best examples of such tests are the 
Seashore Music Tests. Other examples may be found in the 
various monographs which describe the construction of tests 
to measure aptitude for particular jobs. Examples will be 
mentioned in the chapter on vocational tests. It may be 
possible, also, to distinguish special abilities or disabilities in 
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the school, and to measure them by means of tests. Some 
attempts have been made to do this as, for example, that by 
Bronner.' It is well known, for example, that certain chil- 
dren, though normal in intelligence, have great difficulty in 
learning to read. The only tests which have thus far been 
devised to measure satisfactorily this disability are reading 
tests themselves. We may eventually be able to analyze the 
ability required to read and to devise tests to-measure the 
disability, even before the child has attempted to read, or 
apart from the reading activity. Special tests may also be 
used in making a study of racial differences, or of differences 
between individuals in hereditary aptitudes, or in measuring 
the effect of environment and training. 

How, then, must we proceed to the selection of the subject- 
matter for a specialized test? Obviously the first step is to 
locate the ability, and to attempt to define and analyze it. 
Sometimes, although the ability may rightly be described as 
specialized, it is by no means simple, or unanalyzable. 
Musical ability, for example, has been analyzed by Seashore 
into some thirty components. It includes, for example, 
pitch discrimination, discrimination of the intensity of 
sound, recognition of rhythm, recognition and memory of 
melody, discrimination between different degrees of har- 
mony, motor dexterity in musical performance, control of 
the voice, musical appreciation, and many others. There is 
good evidence that not even these elementary capacities, or 
specialized capacities, are simple and unanalyzable. At 
least the capacities as we measure them cannot be regarded 
as ultimate units of mental ability. If we measure discrim- 
ination of intensity by various instruments, for example, 
we find a variation in the result. This means that discrimi- 
nation of the loudness of one kind of sound is not exactly the 


1 Augusta Bronner. Psychology of Special Abilities and Disabilities. 
Little, Brown & Co., Boston, 1917. 
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same as discrimination of the loudness of another kind of 
sound. We may show from another point of view, also, that 
these tests, as they are ordinarily given, do not measure sin- 
gle unitary capacities. The ability to respond to any one 
of these sensory tests involves not only the capacity to dis- 
criminate the sensations themselves, but also the ability to 
pay attention and the willingness to pay attention. 

Probably we must regard this as a somewhat generalized 
mental attitude. A specialized test, therefore, is to some 
degree at least, a general test. When the psychologist 
confesses that he does not know exactly what it is that is 
being measured by the test which he uses, the statement does 
not mean that he has not made an effort to analyze the 
mental processes, but it means that he has found them to 
involve a complexity which he has not yet been able 
completely to resolve. It is the person who describes 
quite confidently the mental processes which are measured 
by his test who displays his ignorance of the subject. We 
cannot, of course, ultimately rest content with the failure to 
designate what it is that is measured by a particular test. 
Our future research must be directed very largely toward this 
problem. 

In the meantime, we can meet our practical needs in two 
ways. In the first place, we can make a provisional analysis 
and give a provisional description of the mental process 
which we believe to be measured by the test. In the second 
place, we can take as the units which are to be measured the 
activities which are required in particular practical situa- 
tions of life. Thus, if we cannot with clearness isolate and 
define memory, or if we find that memory is actually a 
composite of many simpler functions, we may, at least, 
measure the ease and rapidity with which a person learns 
poetry, or with which he memorizes telephone numbers, or 
the names of persons, and so on. In the army a group of 
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psychologists wished to devise tests to measure the aptitude 
of recruits for learning to fly. They took as the measure of 
flying ability, not an abstract measure of any sort, but the 
actual rapidity with which an individual learned when he 
undertook to master the aeroplane. They then selected tests 
to measure the ability in question by trying out a number 
which seemed likely to be successful, and then by correlating 
them with rapidity of learning determined empirically which 
ones proved to be successful. 

We see that the attempt to design tests of specialized 
ability may proceed from two purposes. In the first place, 
the aim may be to make a scientific and accurate analysis of 
the components of mental capacity, or of the various special- 
ized mental capacities. In the second place, the aim may be 
to measure the aptitude which is required to perform some 
particular activity in practical life. The theoretical or 
scientific problem is by far the more difficult, and we have 
made little progress toward its solution. The practical 
problem, because it requires a test or group of tests which 
will work in a particular set of circumstances only, is much 
easier of solution, and we may comment further upon the 
procedure which has been found useful in pursuing it. 

The second step, after we have defined the ability which 
is to be measured, is obviously to invent some means of 
measuring the ability in question. The method which has 
sometimes been used is that of analysis. The psychologist 
attempts to define to himself in psychological terms the 
nature of the ability, and then to assemble, or to invent, tests 
which may be assumed to measure the capacity which is thus 
analyzed. The problem is approached, in other words, from 
the a-priori point of view. On the other hand, the attempt 
may be made from a purely empirical point of view. The 
experimenter may try out one test after another without 
having any particular reason to expect that one test will be 
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more successful than another. He then selects the test or 
tests which prove by experience, or empirically, to work, or 
to be successful. If several tests have been found by this 
procedure to be moderately successful, they may be com- 
bined into a team of tests. 

The method which is most likely to be successful, and to 
reach the solution most quickly, is a combination of these 
two. No experimenter, in fact, ever proceeds in a purely 
random fashion. He makes a rough guess at the tests which 
he thinks will be successful and then proceeds to try them 
out. He sometimes wastes time, however, by not making as 
careful preliminary analysis as he might. This careful 
analysis may lead either to the trial of tests which otherwise 
might not be thought of, or to the invention of the tests 
which may be more satisfactory than any that are in exist- 
ence at the time. 

The opposite error to that of failing to make a careful 
preliminary analysis is the error of resting content with 
analysis and assuming that a test will be successful without 
finding its correlation with some outside measure of achieve- 
ment. Many examples of this error could be found in the 
earlier period of testing. It is relatively infrequent, how- 
ever, at the present time. The correct procedure may be 
illustrated in the field of typewriting. If we wish to devise a 
test to measure aptitude for learning to use the typewriter, 
we may first, through whatever psychological insight we 
may avail ourselves of, assemble or devise a series of tests. 
Each one of these tests must then be put toa trial by giving 
it to a group of individuals, at the same time measuring the 
rapidity and ease with which they learn to use the type- 
writer, and then finding the correlation between their stand- 
ing on the test and their score in learning. 
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3. Selection of subject-matter for tests of general intellectual 
capacity — The existence of general intelligence 


Before discussing the kinds of tests which are adapted to 
measure general capacity, and before indicating what the 
tests of general capacity should be like, it is pertinent to 
raise the prior question whether general intellectual capacity 
exists. The existence of general intellectual capacity is not 
universally accepted, and there is considerable debate con- 
cerning its nature among those who do accept its existence. 

We may distinguish three general conceptions regarding 
the existence of general intelligence. The first is that there 
is such a thing as intellectual capacity which enters to a 
greater or less degree into the performance of all kinds of 
intellectual work, or which constitutes a factor in every type 
of intellectual reaction to the world about us. This general 
factor is the same whatever practical situation it appears in, 
or whatever other intellectual factor it is associated with. 
It is not the sole factor in intellectual achievement, but it is 
the most important one. 

A second view is that there are a few types of general 
intelligence, or general intellectual capacity. There is one 
type which is prominent in meeting one kind of practical 
situation, another type which is prominent in meeting an- 
other type of situation, and so on. On this view we might 
speak, for example, of abstract intelligence, of the ability to 
deal with persons, and of concrete intelligence, or the ability 
to deal with things. 

According to the third view, the mental capacity required 
to deal with any situation is unique, and differs from that 
required to deal with any other situation. While there may 
be common elements in the compounds of capacities which 
are required to meet various situations, there is no such 
thing as general capacity. There are simply bundles or 
groups of particular capacities. 
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Before going on to a further discussion of the various 
views regarding general intelligence and its nature, we may 
further define the question at issue. The question may be 
stated in this way: Does there exist an intellectual trait or 
characteristic, or mode of behavior, which, when it is 
possessed in high degree, renders the person’s intellectual 
performance efficient throughout the entire realm of intel- 
lectual work; or, on the other hand, is the grade of a person’s 
intellectual performance in one type of work entirely inde- 
pendent of the grade ot his performance in another type of 
work, except for the accidental presence in both perform- 
ances of one or more of the numerous factors of which 
intellectual capacity is made up. 

We may first trace the development of the general intelli- 
gence hypothesis, and see how it has worked out in the 
design and organization of mental tests. Theearlier students 
of mental tests who wished to measure general intellectual 
capacity sought for some single test that would measure 
a particular capacity. That is, they identified general 
intelligence with one of the particular mental functions. 
There are several examples of this procedure. Spearman, in 
his early studies of correlation, found a certain degree of 
correlation between various tests of sensory discrimination. 
He therefore concluded on this meager evidence that general 
intelligence consists of fineness of discrimination. He de- 
scribed the difference between high intelligence and low 
intelligence, figuratively, by comparing the former with 
highly tempered steel and the latter with iron. Ebbinghaus, 
it will be remembered, sought a measure of general intel- 
lectual capacity in the combining or associating process, and 
devised his completion test as a means of measuring this 
process. Binet, in some of his earlier experiments, chose 
attention as probably the most essential aspect of intelli- 
gence, and devised a series of tests to measure attention. 
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As the experience with mental tests accumulated, it be- 
came more and more difficult to identify general intelligence 
with any one particular mental capacity. It was found that 
a number of tests proved to be successful, as measured by 
their correlation with criteria or other measures of intelli- 
gence. It was found, furthermore, that a combined score of 
a series of tests usually gave a higher correlation with the 
criteria than did the score from a single test. This has led 
to an attempt on the part of those who still define general 
intelligence as a common capacity to describe it in more 
general terms. Possibly the first to formulate a clear de- 
finition of intelligence from this point of view was Stern. 
Stern defined intelligence as the general mental adaptability 
to new problems and conditions of life. At about the same 
time Burt, on the basis of his studies of correlation, defined 
the central factor as “the power of readjustment to rela- 
tively novel situations by organizing new psycho-physical 
codrdinations.” Similarly Binet, as quoted by Terman, 
describes intelligence as “ (1) the tendency of thought to take 
and maintain a definite direction, (2) the capacity to make 
adaptations for the purpose of attaining the desired end, and 
(3) the power of self-criticism.” Again, Colvin described 
intelligence as capacity to learn. ‘These definitions agree in 
a general way with the description of intelligence by James, 
from the point of view not of mental tests, but of a descrip- 
tion of the mental processes and their development. James 
described intelligence, in contrast to instinct and habit, as 
the adaptation to novel conditions by the variation of be- 
havior.! 

Such descriptions of intelligence in terms of a single for- 
mula have been influential in guiding the practice of the 
development of intelligence tests. The application which is 


1The definition of intelligence will be discussed more fully in Chapter 
XVIII. 
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made of this hypothesis is that the various tests which are 
good measures of intelligence should have a high correlation 
with one another. ‘The notion underlying this application 
is not that a particular test is good because it measures a 
specialized function which is a component of intelligence, but 
because it measures more completely this general function 
which is characteristic of all intellectual activity. The dis- 
tinction may seem a little abstruse at the outset, but it 
touches upon a real difference in different points of view 
regarding mental tests. Our most successful mental tests 
are now recognized as not measuring specific and definite 
mental functions. For example, the tests which are charac- 
teristic of our intelligence scales cannot be classified as 
specific measures of memory, or perception, or association, 
or reasoning, or what not. We may ultimately develop tests 
of this character. At the present time, however, every one 
of our tests must be recognized as measuring a variety of the 
particular mental processes. The theory which is here being 
outlined involves the notion that these tests, in addition to 
requiring the activity of these particular mental processes, 
require also the activity of a general function which is 
brought into play in all the higher types of intellectual 
performance. 

In support of this general theory has been brought the fact 
concerning the intercorrelation of mental tests referred to in 
an earlier chapter by the term, the hierarchy of intelligences. 
The evidence for the existence of a general factor, based upon 
intercorrelation, will be discussed more fully in Chapter 
XVIII. 

The general theory of intelligence, particularly as it re- 
lates to the intercorrelation between tests, has a practical 
bearing upon the construction of tests, particularly with 
reference to the selection of subject-matter. Furthermore, 
the experience which ‘is obtained in the design of tests has a 
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reflex bearing on the theory, as we shall see. We may take 
as our example of this point the experience of psychologists 
in devising the army tests. 

The theory that there exists some sort of general factor 
would seem to imply that those tests which have the highest 
intercorrelation would be the best measures of intelligence, 
and that tests of intelligence should be selected partly upon 
this ground. In the early stages of the work with the army 
tests a contrary theory was adopted, and for a time was 
followed. This theory is expressed in the following words: ! 


The general principle is that the lower any particular test cor- 
relates with them, the greater weight it should have in the com- 
posite. For in the proportion that two tests intercorrelate closely, 
they are repetitive — 1.e., are measures of the same fact — and a 
high weight to each of them will mean an undue weighting of the 
same fact. The lower the correlation of this fact with the fact to 
be prophesied, the more excessive would the weighting be. 


As a result of their experiments in the design of the army 
test, however, the authors have the following to say: 

A test which will not correlate thoroughly well with the total 
score of a good battery of tests is ¢pso facto under grave suspicion; 
there is little likelihood that it will consistently correlate well with 
any other proved measure of intelligence (p. 338). 


We shall see in a moment from the statistical evidence that 
this statement is correct. 

In weighing this issue, we must distinguish two points of 
view, the mathematical and the psychological. From the 
mathematical point of view it may be proved that the guid- 
ing principle which was adopted at the beginning of the 
army testing work is correct. That is, if each one of a 
battery of tests correlates to a certain degree with a criterion, 
then the lower the intercorrelation between the tests, the 
higher will be the correlation of the composite score of all the 

1 Psychological Examining in the United States Army, p. 316. 
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tests with the criterion. ! This is a purely mathematical fact 
which has nothing whatever to do with the psychological 
make-up of intelligence. The psychological question, how- 
ever, is an entirely different one. It is this: Are there tests, 
as a matter of fact, which have a low intercorrelation and 
which correlate to a high degree with a criterion; in other 
words, can tests be found which meet this mathematical 
desideratum? If they cannot be found, then it will be 
necessary to sacrifice either the requirements of a high 
correlation with the criterion, or of a low intercorrelation. 

The psychological fact is clear, and can be demonstrated 
from the army test results themselves. It is displayed in 
Table X, which has been put together from the various parts 
of the army test report. It contains, on the one hand, the 
intercorrelation of the various individual tests of the Army 
Scale A, which was preliminary to Scale Alpha, and also the 
correlation between these various tests and criteria. The 
references to the sources of these figures in the army report 
are given in the table. Following the presentation of the 
various correlations with criteria, and of the intercorrelations 
of the tests, are given the rank orders of these sets of 
correlations. The rank order of the correlations with 
criteria and the rank order of intercorrelations are each 
combined into a composite rank order. The essential com- 
parison is between the composite rank orders. It is very 
evident from an inspection of these data that a test which 
has a high correlation with criteria also has high inter- 
correlations, and a test which has a low correlation with 
criteria has low intercorrelations. The two composite rank 
orders are almost identical. 

It would appear from these facts, then, that for the pur- 


1'This is a deduction from Spearman’s formula given in, C. Spearman, 
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Tasie X. Tur RELATION BETWEEN THE CORRELATION OF 
TESTS WITH CRITERIA AND THEIR INTERCORRELATION 
IN THE CasE or Army Test A 


CORRELATIONS WITH CRITERIA 


~ Oral direc- 
tions 
«2 Memory span 
Disarranged 
Sentences 
Arithmetic 
«ox Information 
@ Opposites 
~sPract. Judgm. 
Number Com- 
pletion 
© Analogies 
= Number | 
Comparison 


eo 
ioe) 
i) 


1. Officers’ rating of 313 
National Guard 315 34 | 48 | 46 


2. Officers’ rating of 338 Men | 331 | 41 |36 | 30 | 46 


8. Mental age, average of 8 
groups 332| 47 |36 | 49 | 59 


4. Trabue B and C, 287 pupils} 337 | 60 |39 | 55 |65 
5. Grade location 337| 49 | 40 | 56 67 


INTERCORRELATIONS 


1. Average intercorrel. 313 
National Guard 58 |47 |54 |61 |64 |61 | 56 | 49 


2, Average intercorrel. 895 
Engineers 62 |52 | 57 |66 |66 | 67 | 60 | 55 


RANK ORDER OF TESTS IN CORRELATIONS WITH CRITERIA 
1 3. iG 1 
2. 10 | 2 3 
3. % 6 [2.5 5 
4 
5 


8 3.5] 3.5 
Mime A eee) 2 
Total , 35 |17.0/14.5 


Composite Rank ie Noes: 2 


RANK ORDER OF 


Total 


Composite rank 


1 Decimal points are omitted from these coefficients. 
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pose of selecting subject-matter for an intelligence test the 
intercorrelation between the tests results in practically the 
same selection as does the correlation between the tests and 
criteria. Furthermore, it is psychologically impossible to 
select tests which have a high correlation with criteria and 
yet which have a low intercorrelation. This being the case, 
and since it is obvious that one must select tests which have 
a high correlation with criteria, we must abandon the de- 
mand, which rests upon mathematical considerations, that 
the test shall have a low intercorrelation. 

The facts which have just been referred to, in addition to 
the hierarchy of test coefficients, support the hypothesis of 
some sort of general factor in intellectual ability. It would 
seem, therefore, that we are justified in regarding intelligence 
as in the nature of a unitary, though possibly complex, 
mental trait, or characteristic. The variation in the achieve- 
ment or productiveness of individuals in the various types of 
human activities may be ascribed in part to differences in 
intelligence, and in part to differences in other mental traits 
than intelligence. 

It might perhaps be concluded from a logical application 
of the principle of the hierarchy of abilities that our best 
procedure would be to find some one test which has the 
highest intercorrelation and the highest correlation with 
criteria, and rely upon this alone. Experience has shown, 
however, that groups of tests give better measures than any 
single test. How is this to be accounted for? 

The fact that no test is a perfect measure, and that groups 
of tests are better than single tests, may be accounted for 
on the ground that every test involves certain irrelevant 
factors as well as the central factor which we are attempting 
to measure. To put it in another way, every test measures 
other capacities in addition to intelligence. Intelligence 
always operates upon material. We think in terms of our 
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experience, and not in terms of purely abstract ideas unre- 
lated to the world of experience. Our ability to deal with 
any materials of thought, then, will depend, not simply upon 
thinking ability in the abstract, but upon our familiarity 
with and our ability to deal with particular materials with 
which the thinking is to be carried on. For example, 
thought may be carried on in terms of language. It may be 
carried on again in terms of mathematical symbols or in 
terms of mechanical relationships. The thought activity 
which is carried on with these different types of materials, or 
with these different modes of expression, may be the same in 
its general character. The skill and ease with which an in- 
dividual may carry on a train of thought, however, will 
depend, in part, upon the nature of the materials and his 
adaptation to them. Take a concrete example. A lawyer 
is able to reason in terms of legal facts and principles; a 
physician is able to reason in terms of medical facts and 
medical laws; the engineer can reason in terms of the physi- 
cal relationships of material things and their laws. The 
thought process may be abstractly the same in all’ these 
cases, but the mental process is colored by the material of 
thought as well as by the form of thought. 

Ifthedistinction between the material and form of thought 
which has just been drawn is correct, we have a justification 
for the use of a variety of tests. The need for this variety is 
due, not so much to the fact that different mental processes 
are measured by them, as that each of them measures the 
mental activity only as it appears in certain concrete opera- 
tions of thought, and that these different concrete opera- 
tions are conditioned partly by their material embodiment, 
as well as by their form. 

If we carry this line of thought a step farther, we are led 
to raise the general question of the validity of our intelli- 
gence tests. The question is whether or not all of our tests 
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are not limited by the fact that they deal with a certain 
restricted range of thought material. Within this range of 
material, they have demonstrated validity. Their range 
has been limited chiefly, however, to the realm of school 
activity. Would they have the same validity if they were 
applied in a variety of other situations outside the school. 
Is school intelligence the same as life intelligence? 

We must undoubtedly answer this question by saying that 
the measure of achievement in the school is not identical 
with the measure of achievement in other situations outside 
the school. To say that it is not identical, however, does not 
mean that there is no relationship. The degree of closeness 
of this relationship we shall have to consider in dealing with 
the practical application of tests. For the present we may 
say that the relation of the test scores to school success is 
closer than its relation to success in life outside. This is due 
to the fact that the materials of the test are largely materials 
of school work. How, then, shall we interpret this lack of 
identity? Shall we say that there is a school intelligence 
and that there is a life intelligence, and perhaps that there 
are various kinds of life intelligences, or shall we say that 
intelligence is measured to some degree by tests which in- 
volve typical school activities, but that the measure is 
limited by the fact that the material in which the tests are 
represented are of a somewhat specialized nature? 

Possibly the question is one of definition. To the writer, 
however, it seems to be the simpler way of expressing the 
facts to say that the intellectual activity required in the 
various situations of life is similar in character to that re- 
quired in school, but that one’s achievements in any partic- 
ular case are conditioned by materials of thought and his 
familiarity with them, as well as by the form of thought. 

Whichever interpretation is the correct one, the fact 
remains that our so-called intelligence tests have limitations, 
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This limitation is sometimes disregarded with serious conse- 
quences. For example, when the scores made by the adults 
in the army on the Stanford-Binet test are given the same 
significance as the scores made by school children on that 
test, a serious error is made. The same error is made when 
the scores of immigrants who have lived in the United States 
for fifteen or twenty years are compared directly with the 
scores of immigrants who have been here but a few years, and 
are treated as having the same significance.!_ The danger of 
misinterpretation of the test scores has led some to hold that 
the term intelligence test is an unfortunate one, and that we 
ought to call the tests academic ability tests, or something of 
the sort. If we retain the name, as we probably shall on 
account of its wide acceptation, we shall do so only with the 
distinct understanding that their application is affected by 
the fact that they deal primarily with school material. 

We have seen that the standing in intelligence tests is 
determined in part, not only by the individual’s native in- 
tellectual capacity, but by the nature of his past experi- 
ence. The aim of intelligence tests is, so far as possible, to 
so choose the materials of which they are composed that the 
effect of differences in experience will be reduced to a mini- 
mum, and this aim has in a measure been attained. Noone 
would claim, however, that the attempt has been completely 
successful. That specific teaching of subject-matter similar 
to that which appears in an intelligence test produces a 
marked gain in the scores in the test is shown in an experi- 
ment by Bishop. ? Groups of high school pupils were given 
special drill in handling problems similar to those in the 
Otis Group Intelligence Scale, and their gains were com- 
pared with paired groups who were given the test twice 


1 For fuller discussion of this problem see Chapter XVII. 
2 Owen Bishop. ‘‘ What is Measured by Intelligence Tests”’; in Journal of 
Educational Research, vol. 9, pp. 29-38. 1924. 
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without such drill. The trained groups made gains from 
two to seven times as large as the check groups. This 
training came very near to being direct coaching, and it is 
not likely that ordinary differences in schooling would pro- 
duce such large differences, but it is clear that intelligence 
tests are by no means independent of schooling. 

We now approach another question related to the first 
one, but not identical with it. This question is whether 
general intellectual capacity is to any degree specialized on 
account of differences in aptitude for dealing with different 
types of problems. Part of the apparent specialization of 
general intellectual capacity may be ascribed to the com- 
bination of intelligence and non-intellectual traits. For 
example, two persons of equal intellectual capacity may 
make a very different impression upon their associates in 
personal intercourse, because of differences in their personal- 
ity and in their reaction to social situations. The one, for 
example, may be timid, and may become confused when he 
attempts to express his thoughts to another individual or to 
an assembly. The other person may be actually stimulated 
by the presence of others so that he thinks more clearly than 
when alone. ‘The one may have presence of mind in an 
emergency, while the other loses his head. The one may do 
better under the stimulus of competition, the other when he 
is impelled by intrinsic motives alone. 

Another possible explanation of the individual’s variation 
in the performance of various intellectual tasks is the pos- 
sible effect of variation in what we have designated as 
specialized factors, in association with general intellectual 
capacity. We have regarded manual dexterity, for example, 
as a capacity which is largely specialized. It differs among 
individuals largely independently of general intelligence. 
However, certain tasks requiring general intelligence for 
their performance require also a certain degree of manual 
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dexterity, or may be performed better by an individual with 
a high degree of dexterity than by one with a low degree of 
dexterity. Other tasks, on the other hand, demand language 
ability, and this may be regarded, to some degree at least, 
as specialized. Other intellectual operations, such as those 
of mathematics, require the manipulation of abstract sym- 
bols. The facility in the use of symbols may possibly be 
somewhat specialized. 

Whatever may be the explanation of the fact,the basic fact 
of practical importance is this: While we aim to measure 
general intelligence, and while we believe we can with some 
degree of precision measure it by means of our tests, we are 
never able to use perfectly general or abstract material. Our 
material is always particularized, and the test is therefore to 
some degree rendered a specialized test. It is specialized 
both from the point of view of previous experience of the 
individual being tested and from the point of view of special 
aptitudes. This particularization, or specialization of the 
test, is not sufficient to invalidate it, but it is sufficient to 
make it necessary to take account of certain limitations in the 
interpretation of the scores. To express these limitations in 
a sentence, we may say that a general intelligence score is 
always to be regarded as an approximation and not as a 
perfect measure of the intelligence, either of the individual 
or of a group. 

The discussion of this chapter has led to the conclusion, 
based largely on the statistics of mental test scores, that 
there is such a thing as general intellectual capacity, and 
that this capacity can be measured, with an approximation 
which gives them practical value, by our intelligence tests. 
There still remains the somewhat more theoretical or 
speculative question regarding the nature of intelligence. 
While the technique of testing may proceed a certain dis- 
tance without raising this more ultimate problem, our 
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search for tests and our interpretation of their results will 
be helped if we can arrive at some more precise idea of 
what it is we are measuring. This depends upon a further 
analysis than has been made in this chapter. It will be 
attempted in the final chapter, entitled, “The Nature of 
Intelligence.” 


CHAPTER X 
TECHNIQUE AND THEORY OF MENTAL TESTS 


II. PROBLEMS RELATING TO THE SELECTION AND 
ORGANIZATION OF THE ITEMS OF A TEST 


1. Principles concerned with selecting the items of tests 


AFTER it has been determined what kind of test material shall 
be used in a test, the next task is to find particular material 
of the kind which has been determined upon. We need not 
consider here in detail the way in which the maker of a test 
goes about it to find appropriate material. He sometimes 
makes use of test material which has already been devised 
by somebody else. In some cases this material is modified, 
and in other cases new material of a similar sort is discovered 
or invented. In some cases a radically new type of material 
is devised. We may assume that, by any means at his dis- 
posal, the person who is to construct a test has made a col- 
lection of a large number of items of the sort which he wishes 
to use. We are concerned now with the principles which 
should guide him in the selection and arrangement of this 
material. 

The first fundamental principle is concerned with the 
difficulty of the items which are to be included in the test. 
The first requirement is that the items be of a suitable gen- 
eral level of difficulty. The test must be difficult enough and 
not too difficult for the individuals for whom it is designed. 
If it is for primary children, it must be of one level of diffi- 
culty; if it is for adults, it must be of a very different level. 
A test does not usually, as we have seen, represent a single 
dead level of difficulty. The items cover a certain range. 
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Thus a primary test may be suitable for administration to 
children of from six to ten years of age. A few tests, as we 
have seen, have such a wide range of difficulty that they are 
designed to be applied to all ages of individuals from the 
third year up. This means that some items are very easy, 
and some are very hard. The test as a whole does not 
represent any one level, although individual parts of the 
test do. 

With respect to the relative difficulty of the various items 
or parts of a test a choice confronts us. We may seek either 
to make all of the items of the same difficulty, or we may 
select items which are graded in difficulty throughout either 
a narrow or a wide range. The maker of a test, in short, has 
to choose whether it will be of uniform difficulty or of graded 
difficulty. 

What are the considerations which will determine this 
choice? We may assume that the test is to be used to dif- 
ferentiate between" the abilities of a group of individuals. 
This assumption means that, after the test has been given, 
the individuals may be arranged in a series or in a distribu- 
tion table according to the scores which they make on the 
test. 

Consider first what is the basis of this ranking of indi- 
viduals in case the items of the test are all of the same dif- 
ficulty. In order that all the individuals of a group may 
score, the test must be easy enough so that everybody can 
pass the various items. If an individual can pass one of the 
items he can pass all of them, or at least most of them. If 
the test is so difficult that some of the individuals cannot 
pass it, or cannot do any of the items, it will not be possible 
to arrange all of the persons in a series. If the test is easy 
enough for everybody to pass it, the differences in score will 
be determined chiefly by the speed with which the different 
individuals work. The speed of working may be deter- 
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mined in part, it is true, by the relation between the diffi- 
culty of the test and the individual’s capacity, but the test 
will be, at least in the main, a measure of speed. 

If, now, the test items are graded in difficulty, it is at 
least conceivable that they may be used to measure, not 
primarily speed, but the limits of capacity. Consider, for 
example, a number completion test. It is thinkable, and 
probably feasible, to construct a scale of number completion 
items which increase in difficulty to a point at which most 
persons would fail. In fact, it is conceivable that the ablest 
mathematical thinker in the world could construct a test on 
which everybody else would fail at some point. A test of 
this character may be said to measure the limits of capacity, 
or, to use a term which has been contrasted with speed, 
power. 

It would be difficult, and perhaps impossible, to construct 
either a pure speed test, or a pure power test. ‘The differ- 
ences in speed, as we have seen, will be caused partly by 
differences in power. On the other hand, most tests do not 
range in difficulty beyond the limits of capacity of at least 
the abler individuals who are tested. If they were given 
time enough, they might pass the entire test. Tests which 
are graded in difficulty are usually given with a time limit. 
While tests are usually neither exclusively speed tests, or 
exclusively power tests, then, they may be predominantly 
the one or the other. They may be comparatively easy and 
given with a time limit which is so set that nobody can 
finish on account of the limitations of time, or they may be 
steeply graded in difficulty and be given with a liberal time 
allowance. 

There has been some inquiry as to whether our ordinary 
intelligence tests are chiefly measures of speed or of power. 
The best known attempt to study this question, and prob- 
ably the most elaborate attempt, was made with the Army 
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Seale Alpha.! The study in question consisted in determin- 
ing the effect of doubling the time of the test. This effect 
was studied in two ways, first, by finding the correlation 
between the scores on single time and on double time, and 
second, by finding the percentage of individuals at various 
levels who improved their score when the time was increased. 

With reference to the correlation between the tests on 
single time and on double time, the authors of the report 
assumed that if the test is a speed test, the correlation would 
be high. The correlation could be low only in case the test 
is a power test. To quote the report: 


A change of order would occur only if the test were of the type 
in which time was relatively unimportant — so-called “‘ power” 
test. Here it might happen that quick individuals scoring high 
would reach the limit of their abilities and fail to profit by addi- 
tional time, whereas slow, capable persons would plod unerringly 
on in the extended period and outdistance in the end their more 
speedy rivals. 


It was found, in fact, that the correlation between single 
and double time was very high, being .965. On the assump- 
tion just mentioned, then, we should conclude that, in so 
far as the correlation fact is concerned, the Army Alpha 
test is largely a speed test. 

It is interesting to note that Brigham has drawn the 
opposite conclusion from the same fact.2. Brigham takes up 
the criticism which has been made of the army tests that they 
are chiefly speed tests, in that they penalize slow, but ac- 

1See chapter rx, part m1, of the Army Report, Psychological Examining in 


the United States Army, edited by R. M. Yerkes. Washington National 
Academy of Sciences, 1921. 

See also G. M. Ruch and Wilhelmina Koerth. ‘“‘‘ Power’ vs. ‘Speed’ in 
Army Alpha”; in Journal of Educational Psychology, vol. 14, pp. 193-208 
(1924); and G. M. Ruch, “The Speed Factor in Mental Measurements”’; in 
Journal of Educational Research, vol. 9, pp. 39-45. 1924. 


* Carl C. Brigham. Study of American Intelligence. Princeton University 
Press, 1923. 
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curate individuals. He refers to the high correlation be- 
tween the scores on single time and double time, and con- 
cludes, “At least in our consideration of the army test 
results, we may definitely discard the opinion that we are 
testing speed rather than intelligence” (page 12). Brig- 
ham’s argument seems to be that because, when we extend 
the time, the individual who made a high score increases 
his score, as well as the one who made a low score originally, 
the person making the low score was not penalized by the 
time limit. But this is not the issue. The question is not 
whether the time limit was too short to allow the slow in- 
dividual to make a score. Suppose that two men were run- 
ningarace. This would certainly be a speed test. Suppose 
the race were not long enough to test endurance and were 
long enough to be unaffected by the differences in reaction 
time at the start. Suppose that we scored them in distance 
at the end of ten seconds, and one had run ninety yards and 
the other one hundred. Suppose then, we scored them again 
at the end of twenty seconds. If they maintained the same 
speed, the first one would have run something like two 
‘hundred yards, and the second something like one hundred 
and eighty yards. Doubling the time would not affect their 
relative scores. 

If the test is a speed test it seems pretty clear that doub- 
ling the time will result in a very high correlation of scores, 
assuming that no considerable number of individuals stop 
before time is called. It seems doubtful, however, that we 
can reverse the argument and say that because there is a 
high correlation the test is necessarily a speed test. This 
depends upon yet another fact, the correlation between 
speed and power. If there is a high correlation between 
speed and power, so that the person who works rapidly also 
has high degree of power, then doubling the time on either 
a power test or a speed test would give a high correlation 
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between the scores on single time and on double time. This 
is because a measure of speed is also a measure of power. If, 
on the other hand, there is a low or zero correlation between 
speed and power, then we should expect that doubling the 
time on a power test would give at least a much lower cor- 
relation between single time and double time scores. It is 
probable that the correlation between speed and power is 
positive, but moderate. If this is the case, and our reason- 
ing is correct, a very high correlation between single and 
double time indicates that the score on the test depends 
largely upon speed of performance. 

The writer attempted to test these assumptions by a 
simple experiment. The plan was to construct a predom- 
inantly speed test and a predominantly power test of the 
same kind of material, namely, number completion examples. 
In the one the examples were of approximately equal diffi- 
culty; in the other they increased in difficulty. The tests 
were given to a university class and 46 complete records 
were obtained. ‘The correlation was found between the 
scores on single time and double time for both tests, and 
also between the scores on the two tests. They were as fol- 
lows: 

Correlation between single and double time, speed test .87 + .03 

Correlation between single and double time, power test .78 + .04 

Correlation between speed and power test .63 + .06 

The comparative lowness of the correlation between the 
speed and power tests shows that there is a distinction be- 
tween speed and power, and that they are at last not per- 
fectly correlated. The lower correlation between single and 
double time in the power test than in the speed test shows 
that if a test gives a very high correlation between single 
and double time the indications are that it is a speed test. 
Hence the Army Alpha Test appears to be largely a speed 
test, as the authors of the army report maintain. The data 
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suggests, finally, that if a test is a power test, and the purpose 
is to measure power, it is necessary to allow ample time for 
its performance. 

The army psychologists attacked the question in a second 
way. They calculated the percentage of individuals who 
gained in score on each test at the various levels. That is, 
they found the percentage of individuals who made a score 
of 1 on single time and who increased their score on double 
time, similarly for those who made a score of 2, 3, 4 and so 
on, up to the highest score made. The hypothesis was that 
if the tests were speed tests, the large majority of persons 
would gain on double time at all levels, but if they were 
power tests, a small percentage would gain on double time. 
What they found was that a larger percentage of those 
making high scores than of those making low scores on 
single time gained with double time. They concluded, 
therefore, that for the low grade individuals the tests were 
more largely power tests and for the high grade individuals 
more largely speed tests. This seems to be a very reasonable 
conclusion. They conclude further, that on the whole the 
power factor is not so important as the speed factor even at 
the lower level. 

Whatever may be the fact concerning the army test, we 
have here an issue which is important. It is probable that 
for some purposes, and in some situations, speed of perform- 
ance is the important qualification. In other cases, how- 
ever, the speed is relatively unimportant, and power, or 
the capacity to succeed in a task which is too difficult for 
other individuals, is the important characteristic. In type- 
writing and stenography, for example, speed is the important 
factor. Speed, of course, is not here contrasted with ac- 
curacy, and accuracy is not to be identified with power. 
What has been said about speed is true of effective speed, or 
speed combined with accuracy. In invention and creative 
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scientific work, on the other hand, power is relatively much 
more important than speed. It is of little moment whether 
Newton, and Darwin, and Tesla, and Edison, and Einstein 
developed their theories or perfected their inventions in one 
year, or ten years, or twenty years. ‘The important thing 
is that they were able to perform intellectually far above 
the average individual. Their work possesses an import- 
ance which is to be measured in terms of quality, rather than 
of quantity. 

The conclusion, then, is that not all intellectual perform- 
ances can be measured in the same dimension. Some 
need to be measured, perhaps, largely in the dimension of 
speed, some in the dimension of quality or power, and some 
in a combination of both. Most of our tests measure a little 
of both, with probably the greatest emphasis on speed. This 
may be the most serviceable, in so far as a single measure is 
concerned. It seems likely, however, that it would be de- 
sirable to secure an analysis of the individual’s ability by a 
test, part of which should depend largely upon speed of 
performance, and part upon power. In this way we could 
establish, not only the individual’s general score, but also 
his capacity in each of these two characteristics separately. 

The discussion of power tests and speed tests, it will be 
recalled, was introduced in the consideration of the two ways 
of organizing the items of a test, namely in a series of equal 
difficulty or in a series of graded difficulty. We may com- 
ment further on the procedure which is to be followed if the 
items are to be graded in difficulty. When test items are to 
be arranged in order of difficulty it is usual to take as the 
measure of difficulty the percentage of individuals who pass 
the test or who fail upon it. Thus a hard item, of course, is 
one on which a large percentage fail. 

The attempt is usually made to select items which shall be 
graded in equal steps or intervals in difficulty. In determin- 
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ing the steps in difficulty the assumption is made that abili- 
ties are distributed according to the normal probability 
curve. This means that at the upper and lower extremes of 
the scale a given step or unit will be represented by com- 
paratively few individuals, whereas in the middle part of the 
scale of difficulty a unit of ability will be represented by a 
large number of persons. Since the number of persons who 
possess abilities which are measured by the various units of 
the scale are not equal in number, it is not correct to deter- 
mine the units of the scale directly by equal numbers of 
percentages of passing or failing. It is necessary to find 
the percentages failing which correspond to equal steps on 
the scale and then to find items which are failed by these 
percentages. The scores representing equal steps of diffi- 
culty are called percentile scores. A table for transmitting 
percentage of failing into percentile scores may be found in 
Rugg’s Statistical Methods, p. 392.1 

The other method of scaling the difficulty of parts of a 
test is the mental-age method. We have already seen how 
this method is applied in the development of age scales. 
In the application of this method we assume that a test which 
is passed by a given percentage of the older children is more 
difficult than one which is passed by the same percentage of 
younger children. Tests may therefore be arranged in the 
order of difficulty by taking those which are passed by a 
given percentage of children of succeeding ages. We cannot 
assume, however, that the differences in difficulty between 
the tests, when they are selected by this method, are equal. 
This would be assuming that the growth in mental capacity 
from age to age is uniform. Whether this is true or not is a 
matter which cannot be assumed, but must be determined 
by other methods of investigation, or on the basis of tests 


1H. O. Rugg. Statistical Methods Applied to Education. Houghton 
Mifflin Company, 1919. 
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which are standardized in another fashion. An age scale, 
since it is standardized in terms of age growth, can give us 
no information concerning the form of the curve of mental 
development. 


2. The number of items in a test, or the length of the test 

It was remarked in Chapter VII that there is reason to 
suppose, theoretically, that a longer test is more reliable 
than a shorier one. Certain authors of tests have applied 
this principle to test construction by employing considerably 
more material than is usual, and by correspondingly length- 
ening the time required to take the test. Thus, the Thorn- 
dike Intelligence Examination for High School Graduates 
occupies three hours. The question now before us is whether 
it can be determined just what the relation is between length 
and reliability, and whether from this it can be determined 
how long a test should be. 

To determine the relation between length and reliability 
a formula has been devised by Spearman and applied experi- 
mentally by Holzinger. If we have a series of similar tests 
of equal length and reliability, and we know the reliability of 
each of the tests or components, we can predict from this 
formula what the reliability of a composite of any number 
of the components should be. Of course, the components of 
a test are not entirely similar nor of equal length or reliabil- 
ity, and therefore the formula cannot be applied rigidly, but 
Holzinger has shown that actual test scores approach this 
law as the characteristics of the tests approach the assumed 
characteristics. In the case of the Otis Self-Administering 
Test, which is typical of the tests we are here interested in, 
the actual increase in reliability, in comparison with the 


? Karl J. Holzinger and Blythe Clayton. ‘Further Experiments in the 
Application of Spearman’s Prophecy Formula”; in Journal of Educational 
Psychology, vol. 16, pp. 289-99. 1925. 
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theoretical increase according to Spearman’s law, is shown 
in Fig. 11 (p. 293, op. cit.). 
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Otis Intelligence Test data. (From Holzinger and Clayton.) 


We may conclude tentatively from the facts before us, 
that for tests composed of the material commonly used in 
our intelligence tests, and for components which occupy one 
and one half minutes working time (the divisions used in 
the above experiment), an increase in the number of com- 
ponents up to five causes a marked increase in reliability, 
and that there is a continued slight increase in reliability 
up to at least ten units. 


3. The form of organization of the items of a group 
language test 
It was remarked in discussing the development of group 
tests that they depended upon the invention of modes of 
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organization which required the subject or the person being 
tested to react only by very simple means. The time re- 
quired to write out a long response, and the difficulty of 
scoring such a response, is a serious obstacle to its use in a 
group test. We may now describe briefly some of the most 
important devices which have been used for this purpose. 
Since the devices used in language tests differ somewhat 
from those used in non-language tests, we may consider them 
separately. 

1. One of the earliest types of language test to be adapted 
to group testing is the completion test. The individual is 
required to supply a missing word, or part of a word, or group 
of words, in a printed passage, to supply part of the letters 
of a word or to supply a number in a series. This type of 
test has been used not only in intelligence scales, but also 
with success in educational tests. In both cases, but par- 
ticularly in educational tests, the interpretation of the in- 
dividual’s response is more or less in doubt. If a person 
fails to supply correctly the missing item, we cannot be sure 
whether this is due to his lack of the information, which is 
required to supply it, or to his inability correctly to solve 
the puzzle which is represented in such a problem. This 
ambiguity is not so serious a difficulty in intelligence tests 
as in educational tests, since in either case intelligence may 
be the capacity which is required. 

2. A second type of tests which is very common is the 
alternative or multiple choice type. This is represented, 
for example, in the yes-no, the right-wrong, or the same- 
opposite tests. In all of these cases the individual is required 
simply to make a choice between alternatives. In other 
cases a series of possibilities is represented numbering from 
three to five, and the individual is required to make choice 
among them. ‘The multiple choice test may be illustrated 
by the analogies test. Examples of this may be found in 
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Army Alpha, which was reproduced on page 139. Another 
illustration is the classification test, or a test such as the 
following, which appears in the National Intelligence Test: 
“Underline the words in parenthesis which tell what the 
thing designated by the first word always has. Lake (fish, 
salt, sand, shore, ‘water).” This test requires a good deal 
more intellectual activity than do some of the multiple 
choice tests. Another type requires the individual to check 
one of a series of answers which is the correct answer to a 
preliminary statement. Test III of Army Alpha is an illus- 
tration of this type. 

Two types of questions arise concerning the alternative or 
multiple choice types of test. The first is concerned with 
the more general psychological significance of the reaction 
which is required by it. The second is a technical question 
regarding scoring, which will be discussed in the next chap- 
ter. The first question is this: How does the grasp of a fact 
which is sufficient to enable one to judge whether a statement 
concerning it is right or wrong compare with that grasp 
which is necessary to enable one himself to make a state- 
ment concerning it? Take a random illustration. The 
ability to give a correct answer to the question, “Does 
manual labor always result in cerebrai hemorrhages?” gives 
very little information about the individual’s knowledge of 
relation between physical exertion and disturbances of the 
circulation. One may reply that the purpose of the intel- 
ligence test is not to determine what the individual’s infor- 
mation or ability to grasp a particular fact is, but rather to 
establish the relationship between his capacity and that of 
another person. For such purposes as general comparison, 
tests organized in this way have proved themselves to be very 
useful. It is worth calling attention to the fact, however, 
that they are not analytical. They do not enable the ex- 
aminer to determine precisely and in detail the intellectual 
equipment of the individual. 
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3. A third device requires the individual to designate a 
superfluous part. This method has been used most pro- 
minently in the Pressey Mental Survey Scales, Cross-Out 
Tests. For example, a disarranged sentence is presented 
with one extra word. The subject is required to rearrange 
the sentence in mind and in this way determine what the 
excess word is and then cross it out. In another test a 
group of words is presented, all of which designate objects 
of the same class, except one. This one is to be crossed out. 
This type of test, if carefully planned, may necessitate a 
careful examination of the subject-matter of the test and 
real thought on the part of the examinee. 

4. A fourth type requires that a series of items be arranged 
in rank order. This has been applied most often to moral 
judgment tests, or ethical tests, in which the individual is 
required to arrange in order of their seriousness a number 
of misdemeanors. 

5. A fifth type may be classed with the fourth in that it 
raises somewhat similar questions. It requires that two 
series of words be matched in pairs. An example of this type 
of test is the ““matching proverbs ”’ test in the Otis Advanced 
Examination. It is hard to telle how far one’s success in 
passing tests of this sort is due to his ingenuity in experi- 
menting with the arrangement of the items, and how far it is 
due to his ability to comprehend the relationships which are 
involved. Again we may say that so far as securing a general 
measure of intelligence is concerned this question may not 
be important; but the distinction which has just been raised 
is important in its bearing upon the effect of training or 
practice in the ability to pass a group test. This practice 
effect has sometimes been found to be considerable, running 
about ten per cent or more. It seems likely that the practice 
effect consists largely of an increase in the ability of the in- 
dividual to handle the mechanics of the test. If this is the 
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case, differences in practice with this kind of material must 
be taken into account whenever we compare groups of in- 
dividuals, or individuals who are likely to have had different 
opportunities to secure this practice. 

6. A sixth type of test requires that the parts be re- 
arranged so as to make sense. This method is used in the 
test which requires the words of a sentence, which have been 
placed in random order, to be rearranged so as to make 
sense. It appears in the Binet scale, and has béen used in a 
good many group tests. It is at once a test of ingenuity and 
of the familiarity with the subject-matter of the sentence. 


4. Modes of organization of the items of the non-language test 


At least four of the types of examples which are used in 
the non-language test are analogous to the types which are 
used in language group tests. For example, the completion 
form is very commonly employed. The completion test is 
used with pictures in which one part is omitted, which the 
child is to draw in. It is also used with a series. An example of 
this is the so-called ““X.-O” series, in the Army Beta test. 

The alternative or multiple-choice test is also used. The 
opposites test is represented graphically by means of draw- 
ings of objects. The classification test is employed by show- 
ing drawings of objects in place of words. 

Rearranging the parts of a series so that they shall be in 
an order which shall have sense is represented by the “Foxy 
Grandpa”’ series of pictures in the Performance Scale which 
was used in the army. Other series of pictures of a similar 
sort have been used. 

The test which requires a superfluous part to be crossed 
out is represented by the commonly used absurdity test. 
This consists of a series of pictures, each of which contains 
some part which does not belong in the picture. 

In addition to these types of tests, which are analogous to 
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those used in the language scales, are some that are peculiar 
to the non-language test. For example, the directions test is 
a very common one. This is illustrated by the first test in 
Army Alpha, which is, in fact, a non-language test. Many 
of the tests in the Dearborn Scale are directions tests, and 
the same is true of the Cole-Vincent Test for School En- 
trants, and a number of other primary tests. 

A common form of test is the one which requires the re- 
cognition or combination of figures of different shapes. In 
- some cases the figures are to be combined in order to make 
another figure. In some cases the form of a figure is to be 
reproduced in a drawing, and in other cases figures are to be 
matched. The cube analysis test of Army Beta is somewhat 
related to these tests. 

Finally, we may mention the tests which require the inter- 
pretation of pictures. An example of this type is a picture of 
a boat which the child is required to interpret so as to tell 
whether the boat is moving or still. 

The same critical questions which were raised in regard to 
the types of tests used in the language scales may be applied 
to these types when they are used in the non-language 
material. We have recently passed through a period which 
has been very fruitful in the invention of new devices for 
group testing. In the next period of development we shall 
probably have a much larger amount of critical experimental 
evaluation of these methods than has characterized the 
period of rapid development. 


CHAPTER XI 


TECHNIQUE AND THEORY OF MENTAL TESTS 
III. PROBLEMS RELATING TO SCORES AND NORMS 


1. Mental test scores; the raw score 


THE score in a mental test is, of course, a numerical quan- 
tity. The meaning of this quantity, however, depends upon 
the nature of the material of which the test is composed, 
and the way in which the material is organized. The raw 
score is the expression of the achievement of the individual 
in terms of the unit of which the scale is composed. The raw 
score has no significance in itself. The same raw score may 
mean a different thing in the case of different tests, according . 
to the unit which is employed, or to the conditions of the 
test. One exception to this statement is to be found in the 
case of mental age, provided we include this among the raw 
scores. ‘The raw score takes on significance as it is trans- 
lated into comparative or relative measures. The way in 
which this is done will be considered after we have mentioned 
a number of illustrations of raw scores. 

The raw score may consist, in the first place, in a numeri- 
cal statement of the amount which is accomplished within a 
given time limit. An illustration of such a raw score is that 
which is obtained by counting the number of letters which a 
person crosses out in a printed text. Another score of sim- 
ilar character expresses the number of substitutions which a 
person makes, as in the case of the digit-symbol test. It is 
obvious that these raw scores are affected by the character 
of the material which is used. 

The amount done when a time limit is not imposed is a 
type of raw score which, as we have already seen, may be 
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regarded as a measure of power. Illustrations of this sort 
of score are obtained from tests of span of attention. For 
‘example, the test used in the Binet scale which requires the 
individual to reproduce a list of numbers which is spoken to 
him measures the limits of capacity in this kind of perfor- 
mance at the time of the test. It is not affected primarily 
by the speed with which the individual is required to respond. 
A test in which the items are arranged in increasing diffi- 
culty to a point beyond the capacity of the individuals 
taking it, when the time is not limited, may be called a 
power test. 

The score of a speed test may be expressed in terms of the 
amount done in a given period of time, or the time required 
to do a given amount. ‘The amount performed in a given 
time can be used in a group test, but the time required to 
do a given amount can be used most conveniently in an 
individual test. This is because it is not easy to record 
the time which is occupied by various individuals of a 
group in performing a given set task. 

The time score has an advantage over the amount score 
in that all the individuals who are given the test perform the 
same amount of work. If there are irregularities in the 
difficulty of the different parts of the test, these will affect 
equally the scores of all the individuals. If the time limit 
is used, however, some individuals may meet certain diffi- 
culties which others escape. The difference is not one of 
great importance if the items of the test are well graded. 

Another type of raw score is given in terms of units dis- 
criminated. This type of score is illustrated in the sensory 
discrimination tests. For example, in the test for discrim- 
ination of pitch the unit of measure is the vibration fre- 
quency. ‘The score of the individual is the least difference 
between the vibration frequencies of the two tones which 
can be discriminated, assuming a certain basic pitch as a 
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standard of comparison. In the case of weight discrimina- 
tion, the unit difference might be expressed in fractions of 
an ounce, or grams, or any other unit of weight. In dis- 
crimination of the intensity of sound, the unit needs to be 
expressed in terms of the instrument which produces the 
sound. 

The most common type of raw score in current use is the 
point score. The most common method of finding the point 
score is to add up the items of a test which the individual 
passes. In some cases a deduction is made for errors, and in 
other cases different parts of a test are given different 
weights. These procedures will be discussed later in the 
chapter. In some cases the point score is made up of such 
constituents as the number of moves which are made in 
passing a test, and the time which is taken. In all cases, 
the point score is quite obviously a raw score, in the sense 
that it is not self-interpretative. Its significance needs to 
be found by comparison with a standard. 

The mental age may be regarded as another form of raw 
score, but it is different from the ones which have been 
mentioned in that it carries with it its own significance. 
This significance, however, is incomplete, unless a relative 
score, such as the I.Q., is found. 


2. The accuracy of the score and the sources of error 


We have already seen in our reviews of Spearman’s 
critique of mental testing, in Chapter III, that he called 
particular attention to the problem of the accuracy of the 
scores in mental tests, and that he proposed methods for 
determining the accuracy of scores and for making allowance 
for errors. We may now proceed to an analysis of the sources 
of errors as they have been brought out in subsequent in- 
vestigation. 

A clear account of errors in test scores has been given by 
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Holzinger.t We shall follow his classification. He distin- 
guishes five types. 


1. Scale errors. These errors are inherent in the tests them- 
selves. They consist in the selection of unsuitable material for the 
test, or in the imperfect gradation or arrangement of the material. 
The correction of scale errors can be accomplished by the im- 
provement of the test itself. 

2. Scoring errors. Scoring errors occur chiefly in “product 
scales,” in which the pupil’s product is compared with those which 
compose a scale. They are errors in judgment which occur in 
grading specimens. These errors can largely be eliminated in the 
usual type of mental test in which the pupil makes a response that 
can be scored objectively and in quantitative,terms. 

3. Response error. This is avery troublesome error. It is caused 
by the actual fluctuation in the pupil’s response from one occasion 
to another, caused by changes in emotional condition, interest or 
effort. It is this error chiefly which is estimated by the reliability 
coefficient. It is avoided by using the composite result of a suffi- 
cient number of tests. 

4. Sampling error. This error appears when we take the scores 
of a given group as representative of another group or of persons 
in general. It applies especially to the use of norms. 

5. Sporadic error. ‘Sporadic errors are those due to arith- 
metical blunders in scoring, misunderstanding of test directions, 
time lost by the pupil with a broken pencil, ete. Such errors may 
be eliminated” (p. 281). 


For the purposes of mental tests, the most important of 
these errors, except for the use of norms, are the scoring 
errors and the response errors. If the scoring errors are 
largely eliminated, as they may be, our concern is mainly 
with the response errors. The interpretation of the score 
of a pupil is determined by the amount of this error. Hol- 
zinger gives formule by means of which this error may be 
estimated. It is important for the user of mental tests who 
does not have at his command the technique of estimating 


1 Karl J. Holzinger. ‘‘An Analysis of the Errors in Mental Measure- 
ment’; in Journal of Educational Psychology, vol. 14, pp. 278-88. 1923. 


ae ‘<3! ral : 
ORES ASD SOBMS — T 


response errors to take two precautions. Vird,hethald 
¢ particularly wary A concussions drawn from repeated tet 
somes which are designed to show the individual's progress. 
mA, in estimating a pupil's ability he should avail him- 
lst dhe senedt of ecversl teste, vetoes than of one slone; 


4. Treabmcnt ff wrong onswers 
Vxcept in special cases the wrong anewers in 2 mental test 
are Gisregarded. The score is made up A the total number ; 
correct, answers. There are certain cases, however, in 
ich it. seems theoretically demrable to take some account 
A the errors 28 we) 28 A the correct responses. These are 


It is possible tor the individual to tain the correct answer 
$n 2 certain number of cases by pure guessing Vf the score 
is to represent exactly the individual's knowledge or capac- 
tty, such right anewers ought dearly to be discounted. The 
| practice has heen to attempt to determine how much the 
| —geore tanld be discounted by caleulating from the number 
errors the number A right anewers which the individual 
probably got by guessing. 

The theory and the resultant practice may be set forth by 
an Ulustration from the yes-no type A test. In pout of fact, 
while the theory would apply to the multigh-cluice test, 2s 
well 2% to those which oier only two choices, it has been 
applied only to the two-choice tests. The theory may be 
ilustrated thus: Suppose that we were in possession of all 
the facts comecerning an individual's response to 2 test. 
Suppose, then, the following situation exists. There are 
twenty items in 2 test. The individusl knows ten of the 
items. He therefore passes these ten correctly because he 
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knows the answer. He attempts six more items, but guesses 
onallofthem. According to the theory of chances he would 
get three right and three wrong. (This, of course, would only 
be true in the long run, and not in each particular case, but 
only in a certain proportion of them.) The number of right 
answers would then be thirteen, the number of wrong an- 
swers three. If now, we started only with a knowledge of 
the number of right answers and of the number of wrong 
answers, we could work back to the true score by subtract- 
ing the wrong answers from the right answers. This would 
give us the number which the individual got right because 
he knew the answers. This number would, of course, be ten. 
We may express this procedure for finding the correct answers 
in the following formula: True Score= Right — Wrong. 

This procedure has been adopted almost universally in 
tests of the alternative type. It has been vigorously criti- 
cized from various points of view, however, and it is very 
questionable whether it is based upon correct assumptions. 
The first criticism is based upon the theory of chances. The 
procedure assumes that if an individual guesses on a certain 
number of the items of the test, he will guess right as many 
times as he will guess wrong. As was parenthetically re- 
marked in the previous paragraph, this would be true only in 
the long run. It would not be true in a large proportion of 
the individual cases. This is because the number of items 
on which the person guesses are so small. If there were 
one hundred, for example, the proportion between right and 
wrong guesses would be close to 50 per cent in nearly all 
cases. ‘The procedure then breaks down purely on the 
theory of chances.! 

1 See the discussions of the effect of chance on the score which is obtained 
by the formula, S=R—W, by H. H. Hahn: “A Criticism of Tests Requiring 
Alternative Responses”; in Journal of Educational Research, vol. 6, pp. 236- 


40 (1922); and Wm. Asker, “The Reliability of Tests Requiring Alternative 
Responses”; in Journal of Educational Research, vol. 9, pp. 234-40, 1924. 
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The second objection is a psychological one. We cannot 
assume that all of the wrong answers are guesses, nor can we 
assume that a person guesses on as many answers which he 
got right as on those which he got wrong. West shows, in 
an experimental study of the alternative response type of 
test, that the subjects may not guess right the same number 
of times that they guess wrong, and that the score which is 
obtained when we use the formula is not the same as the 
true score which we derive when we know the items on which 
the individuals actually guess. West had a class of college 
students take an opposite test consisting of fifty items. He 
had the individuals tell him all of the items on which they 
were confident of the answers and all on which they guessed. 
He secured the following results. 


TasBLe XI. ANAtysts oF THE RESPONSES TO A RigHT-WRONG 
Test (from West) 


Not 
Guess| Guess | Guess | Guess 
Right |Wrong| Total | but 
Wrong 


A number of facts may be noted from these data which 
contradict the assumption underlying the R — W formula. 
The first is that not all the answers which are wrong can be 
assumed to be guessed. In this particular case, about half 
of the wrong answers were wrong, not because they were 
guessed, but because the subjects did not know the answers. 
The subjects did not guess on these items, but nevertheless 
they got them wrong. A calculation which was based on 


1Paul V. West. “A Critical Study of the Right-Wrong Method”; in 
Journal of Educational Research, vol. 8, pp. 1-9. 1923. 
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the assumption that these items were guessed would there- 
fore be in error. The second point is that the subjects did 
not guess on the same number right as they guessed wrong. 
They guessed more right than wrong. 

Aside, however, from the question whether the assump- 
tions of the procedure are correct or not, it appears that it 
does not give the true score. The formula gives a score of 
21.07. The true score, as based upon the testimony of the 
individuals taking the test, should have been 26.19, or 5.12 
above the calculated score. The true score is found by sub- 
tracting from the number of attempts all those which were 
guessed and also all those which were got wrong, but which 
were not guessed. Not only is the average true score of the 
group higher than the score which is obtained by the for- 
mula, but the individual scores are not highly correlated. 
The correlation between the true scores and those obtained 
by the formula is only .718. The average deviation between 
the true scores and those obtained by the formula is 6.35. 
There are thus very large individual variations in the cal- 
culated scores from the true scores. 

As a substitute for this theoretical method of determin- 
ing what deduction should be made for errors, Thurstone 
proposes that the proper deduction for errors should be 
determined in an empirical fashion. He suggests a formula 
by which may be determined what deduction for errors will 
give the highest correlation between the tests and the cri- 
teria.!_ The purpose of the procedure is not primarily to 
allow for guessing, but to find the proper relative weight to 
give to speed and accuracy. Thurstone gives data to show 
that a deduction for errors which is determined by this 
empirical method will give a good correlation with the 
criterion, whereas another type of deduction for errors will 


1L. L. Thurstone. “A Scoring Method for Mental Tests ”’ ; in Psycho- 
logical Bulletin, no. 16, pp. 235-40. 1919. 
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give a much lower correlation. This procedure is probably 
preferable to the use of the R — W formula, but it is cum- 
bersome to use, and it is uncertain whether it would be 
satisfactory in all cases. 

Holzinger '! has shown that all necessity for allowing for 
guessing disappears when all the individuals are given op- 
portunity to attempt all the items of the test, since under 
these circumstances there is a perfect correlation between 
the scores consisting of the right answers and of the rights 
minus the wrongs. ‘This procedure, of course, cannot be 
followed in a test in which a time limit is imposed. 

If the alternative type of test is used the best procedure 
seems to be to score in terms of right answers. The in- 
structions should be so worded as strongly to discourage 
guessing, instead of encouraging guessing as is done in some 
of our tests. This would greatly reduce the errors in scores 
from this source. Furthermore, it seems desirable, wherever 
convenient, to use another type of test in place of the alter- 
native type. The increase of the number of choices from two 
to three, four, or five decreases the chance of error from 
guessing proportionately. The definiteness and the ease of 
scoring of the multiple choice test makes it a very useful 
one. It is particularly useful when our purpose is merely 
to discover the relative grasp of the subject or the relative 
intelligence of individuals of a group. If we wish to deter- 
mine absolutely the amount of information possessed, how- 
ever, the test has a definite limitation. This limitation grows 
out of the fact that it requires a much more thorough grasp 
of a subject to give an item of information independently 
than is required to designate the correct answer out of a 
number of possible answers. If a more independent grasp 
of information is to be measured, it is better to use the com- 


1 Karl J. Holzinger. ‘On Scoring Multiple Response Tests”; in Journal 
of Educational Psychology, vol. 15, pp. 445-47. 1924. 
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pletion type of test, or some other type which requires the 
individual to supply the answer himself instead of indicating 
a choice among answers. 


4. Weighting test scores 

Scores are weighted in order to modify the share which 
the raw scores of the individual items of a test have in the 
total score, or to modify the share which the raw score of one 
test has in the composite score of a group of tests. In some 
cases weighting is introduced in order to equalize the share of 
various items or various tests, and in other cases it is intro- 
duced in order to make the share of the items or of the tests 
unequal. 

Weighting for the purpose of making the share of various 
tests of a scale equal is commonly represented in the follow- 
ing situation. Suppose that a scale consists of five tests. 
Suppose further that these tests contain an unequal number 
of items, for example: test 1, 30; test 2, 20; test 3, 10; test 4, 
15; and test 5, 25. If the difficulty of the various tests is 
scaled alike, it is clear that test 1 will contribute three times 
as much to the composite score as test 3, and twice as much 
as test 4. Unless we regard test 1 as more important than 
the other tests in this proportion, the use of the raw scores 
will throw the scale out of balance. This is usually pre- 
vented by multiplying the score on each of the tests by a 
factor which will make the total possible score on each test 
about equal to that on the others. Since the difference 
between the fifth and the first test is small, we might disre- 
gard it and correct only the scores on tests 1, 3, and 4 by 
making the total possible score on each thirty; thus the 
score of test 2 would be multiplied by 13, the score of test 3 
by 3, and the score of test 4 by 2. 

It is not altogether clear that the weighting of the tests 
of a scale so as to equalize their share in the total score is 
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necessary or desirable. The correlation between the total 
scores of scale A, in which the individual tests are weighted 
in this way, with the total raw scores, is given in the Army 
Report (page 340). The correlation with one group of 900 
men between the weighted and unweighted scores was .994. 
With another group of 300 men it was .93. Weighting of 
this sort appears to make little difference in rank. This 
would appear to be what we should expect theoretically. 
' Equalizing the share of the different tests is important only 
if the tests measure relatively different and distinct mental 
capacities. If they measure about the same capacities the 
existence of an unequal share of the different tests in the 
total scores is not serious. While it is, of course, true that 
the content or subject-matter of the various tests of our in- 
telligence scales is different, it is questionable whether they 
measure fundamentally different mental processes. We 
have no good evidence that they do. It appears, at any 
rate, that the distinction between what is measured by the 
various particular tests is not sufficiently clear to warrant 
the refinement in method which is represented by weight- 
ing their scores. 

The second aim of weighting is to make the scores of the 
items, or of tests, unequal. This type of weighting is based 
upon the assumption that the importance of the items or of 
the tests is not the same. The variation in importance is in 
general based upon one of two facts. The first is the differ- 
ence in difficulty of the various items of the test. It is 
sometimes assumed that the more difficult item should be 
given greater weight than the easier items. The items of the 
test are therefore multiplied by a factor which is propor- 
tional to their difficulty. The second basis for estimating 
the importance of items, or of tests, is their correlation with 
the criterion. When this basis is used, a test is weighted by 
multiplying by a factor which is roughly proportional to the 
correlation between: that test or item with the criterion. 
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Weighting for the purpose of making the share of tests in 
the total score unequal is coming to be less commonly used 
than it was a number of years ago. The procedure of deter- 
mining the weights and of applying them to the scores is a 
cumbersome one, and empirical studies do not seem to in- 
dicate that the resulting score is particularly better than the 
raw score. 

In a report communicated to the writer in manuscript, 
P. V. West gives the correlation between the weighted scores 
and the unweighted of six of the Army Alpha tests. The 
correlations are based upon the total scores, and they are as 
follows: 

1 g 3 6 7 8 
.975 . 9756 . 932 . 966 . 984 .940 

It is evident from these high correlations that the ranks of 
the individuals in the raw scores are practically the same 
as in the weighted scores. In another table, which is not re- 
peated here, the intercorrelations between the various tests 
based on the raw scores and on the weighted scores are given. 
In general these intercorrelations resemble each other very 
closely, indicating that the raw scores measure practically 
the same thing as is measured by the weighted scores. 

A series of similarly high correlations between weighted 
and unweighted scores in a standardized educational test in 
algebra were found by Douglass and Spencer.! Four cor- 
relation coefficients ranged from .98 to .996. Holzinger ” 
also reports high correlations between weighted and un- 
weighted scores and points out that when this correlation is 
much higher than the self-correlation (reliability coefficient) 
of the test the use of weighted scores is unjustified. 


* Harl R. Douglass and Peter L. Spencer. “Is it Necessary to Weight 
Exercises in Standard Tests?” in Journal of Educational Psychology, vol. 14, 
pp. 109-12, 1923, 

? Karl J. Holzinger. ‘An Analysis of the Errors in Mental Measure- 
ments”; in Journal of Educational Psychology, vol. 14, pp. 278-88. 1923. 
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These results would seem to agree with a common sense 
analysis of the problem. Assuming that the tests of a scale 
are arranged in ascending order of difficulty, and each item 
is assigned a score of 1, the person of high ability will always 
make a score superior to that of a person of low ability. The 
assigning of greater weight to the difficult problems as com- 
pared with the easy ones will make a difference in the re- 
lative size of the scores, but not in the order of the scores. 
Since the order or ranking of the scores is the. important 
thing, and we seldom or never attempt to calculate the ratio 
between two scores, the unweighted score is as serviceable as 
the weighted score. 


5. Measures of relative standing 


In the previous section we have discussed the raw score 
and the method of obtaining it. It has been said that the 
raw score in itself has no meaning. It needs interpretation. 
The only original score, or raw score, which carries its own 
interpretation is the mental age. Even this, however, has a 
limited interpretation. We may now consider, first, the 
further methods of interpreting the mental-age score, and 
second, the methods of interpreting point scores. In 
general, these methods consist of turning the raw scores, 
which are in absolute terms, into relative scores. 

The first relative score which was used to interpret the 
' mental age was a difference, namely, the difference between 
the mental age and the chronological age. Thus, if an in- 
dividual has a mental age of twelve and chronological age of 
ten, his intellectual superiority would be represented by a 
score of twelve minus ten, or two years. This means that an 
individual’s intellectual development is two years beyond 
what we should expect from his chronological age. 

As was remarked in discussing the Binet scale and its 
development, it was soon discovered that the significance of 
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this mental age-chronological age difference was not the 
same for the different stages of the child’s life. A year’s 
difference was found to be more significant, when the meas- 
urement was made by the Binet scale, in the case of the 
young child than in the case of the older child. To avoid 
variation in the meaning of the measure, another measure 
was used which consists of a ratio rather than a difference. 
This is the familiar intelligence quotient, which is the ratio 
between the mental age and the chronological age. Thus a 
child whose mental age is twelve and chronological age ten 
would have an intelligence quotient of twelve divided by ten, 
or 1.20 (usually written 120). The same quotient would re- 
present the intelligence of the child whose mental age is six 
and chronological age five. The intelligence quotient of 
these two children would be the same, while the difference 
between the mental and chronological age would be twice 
as much in one case as in the other. 

Since this ratio, which has been found empirically to work 
well with the Binet scale and its modifications, has been used 
in some instances with other types of scales, it is desirable to 
inquire into the assumptions upon which it is based and into 
the conditions under which it may legitimately be applied. 

The fundamental requirement of a relative measure of 
intelligence, as already suggested, is that it shall remain 
constant throughout the period of mental development. By 
constancy is here meant that a particular I.Q. shall have the 
same significance in all the ages to which it is applied. It 
does not refer to the constancy of the L.Q. of any particular 
individual. ‘That is another aspect which must be discussed 
separately. The use of the I.Q. as a measure, however, 
means that a particular I.Q. shall have the same significance 
at six years that it has at ten or twelve years. To put it 
another way, if we should make a distribution of an un- 
selected group of individuals at a series of ages, and should 
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calculate their I.Q.s, the individuals at corresponding points 
in the various distributions should have the same I.Q. For 
example, individuals at the lower quartile of each distribu- 
tion should have the same I.Q. as the individuals at the 
lower quartiles of all the other distributions. To put it 
another way, the variability or range of the I.Q.s at the vari- 
ous ages should be the same. This requirement seems to be 
met substantially by the I.Q.s calculated from the Binet 
scale. What is the explanation for this constancy of the 
£.Q:? 

We may analyze the situation most readily by the graphic 
method. As the writer has pointed out in another place, 
there are two statistical facts which are involved in the pro- 
blem. These are first, the form of the age progress curve, 
and second, the relative distribution of the scores in the 
succeeding ages. If either of these two factors is constant, 
the other one may vary in such a way as to make the I.Q. 
valid and comparable from year to year. That is, if the age 
progress curve is a straight line or if the increments from 
year to year are the same, the distribution of scores in suc- 
ceeding ages may be such as to render the I.Q. valid. On 
the other hand, if the distribution of scores from year to year 
is the same, the form of the age progress curve may be such 
as to render the I.Q. valid.! 

Fig. 12 illustrates the case in which the yearly increments 
are uniform but the spread of the distribution in the succeed- 
ing years increases uniformly and proportionately. It also 
involves the assumption that the growth curves have their 
origin at birth. The upper line represents the curve of the 
individuals of median ability; the lower line represents the 
curve of individuals at some lower level, in this particular 


1Frank N. Freeman. “The Interpretation and Application of the 
Intelligence Quotient”’; in Journal of Educational Psychology, vol. 12, pp. 3- 
13, 1921. 
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case with an I.Q. of .66. The mental age of the individuals 
at any point in the lower curve is found, of course, by finding 
the point on the median curve which is on the same hort 
zontal level with it and then by projecting downward to find 
the age which corresponds with this pomt. Thus, the m- 


Score 
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Age 


Fic. 12. Hyporsetican Growrs Corvss ro Give a Consrayt LQ. 


dividual on the lower line of development at age nine has a 
mental development which corresponds to that of the median 
individual of age six. The intelligence quotient of this in- 
dividual is, then, six divided by nine, or .66. In the same 
way the I.Q. of any individual at any point on the lower line 
may be found, and it will be seen that it is always the same. 
That this is so can be demonstrated geometrically on the 
principle that the sides of similar triangles are proportional. 
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The second case is illustrated in Fig. 13. The upper curve, 
as before, represents the mental growth of the median in- 
dividual and the lower curve the growth of an individual of 
inferior capacity. In this case, the feature which is constant 
from year to year is the distribution of the scores. This is 
represented by the distance between the two lines. It will 
be seen that this distance is the same in succeeding parts of 


Score 
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the curves. The curves of development, however, are not 
drawn as straight lines, but as logarithmic curves. That is, 
the heights of the curve at the points above the base line 
representing the various ages are the logarithms of these 
ages. For example, the logarithm of two is .301, of three, 
477, of four, .602, of five, .698, and of six, .778. This pro- 
duces a curve which rises sharply in the early ages and more 
and more slowly in the later ages. If, now, we calculate the 
intelligence quotients of various individuals which are repre- 
sented by the lower curve, in the same fashion as for the pre- 
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ceding figure, we find that these quotients are the same at 
the various ages. In other words, an age development curve 
which has the logarithmic form, assuming that the distribu- 
tion at succeeding ages is the same, will give a constant 1.Q. 

Another way of expressing the condition underlying the 
constancy of the I.Q. is to say that the overlapping of the 
scores of succeeding ages must be a proportionately increas- 
ing one from year to year. This increase in the overlapping 
of scores may be due to the diverging lines of yearly develop- 
ment, on the one hand, or to the decreasing rate of mental 
growth, on the other hand. An examination of the two 
figures will show that this increase in overlapping is regular 
and increases proportionately. 

It was the observation of this increase in the overlapping 
of scores from year to year which first called attention to the 
necessity of a ratio to express intelligence in the case of the 
Binet scale. This overlapping was not analyzed, however, 
in order to determine whether it was due to a decreasing rate 
in mental growth, to increasing range of distribution, or to 
a combination of the two. The issue has perhaps most 
clearly been set forth by Woodrow.! Woodrow has drawn 
his curves so as to represent both a decreasing rate of ma- 
turity and an increasing distribution from year to year. 
These curves have obviously been drawn empirically, how- 
ever, in such a way that the combination of these two factors 
will produce a constant intelligence quotient. The data of 
the Binet scale give us a measure of the amount of overlap- 
ping of scores from year to year, but since they are repre- 
sented in terms of mental age, or because they are standard- 
ized by age, they give us no means of determining which of 
these two factors or what combination of them is at the basis 
of the constancy of the I.Q. 


‘Herbert Woodrow. Brightness and Dullness in Children, pp. 46-48. J. 
B. Lippincott Company, 1919. 
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When we come to deal with point scales, we can deter- 
mine independently the form of the age curve and the dis- 
tribution of scores from year to year. It is not necessary to 
discuss the fact in detail at this point. Illustrations are 
given in Chapter XIII. We may sum up the matter by 
saying that the age-growth curve seems to approach much 
more nearly a straight line than a logarithmic curve, within 
the limit of those ages for which a particular test is well 
suited, and up to the period of adolescence. So far as the 
distribution is concerned it seems to increase somewhat from 
year to year, but not enough to make the intelligence quo- 
tient constant. 

Rand has shown, in the article cited on p. 282, that in 
fact, there is not a combination of decreasing mental growth 
and increasing range of distribution in the case of most point 
scales, so as to produce the proportional increase and over- 
lapping from year to year which is necessary as a foundation 
for a valid 1.Q. The I.Q. is not a suitable measure then, for 
use with the ordinary point scale. 

Examples of the variations among I.Q:s derived from dif- 
ferent tests and a suggested method of equating the I.Q.s on 
different tests are given by Miller.!. The method of equating 
consists of a chart by means of which the 1.Q. on a given test 
may be translated into terms of the standard deviation of 
the I.Q. or the reverse. This method serves to equate scores 
on different tests, but not to equate scores for different ages 
of the same test. 

Another ratio to express relative intellectual capacity, 
somewhat similar to the intelligence quotient, is the co- 
efficient of intelligence. This ratio, which was first used by 
Yerkes, Bridges, and Hardwick in their point scale, is the 


1W.S. Miller. ‘The Variation and Significance of Intelligence Quotients 
Obtained from Group Tests”; in Journal of Educational Psychology, vol. 15. 
pp. 359-66, 1924. 
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ratio between the point score of the individual and the point 
score which is the norm for his age. Thus if the norm for a 
given age is 100, and the individual makes a score of 80, his 
coefficient of intelligence is .80. 

The coefficient of intelligence has not been very widely 
used and its relation to the intelligence quotient has not been 
much discussed. A moment’s thought, however, will show 
what that relationship is. If the coefficient of intelligence is 
to be a valid relative measure of intelligence, it must, like the 
intelligence quotient, have the same significance from age to 
age. That is, it must be constant. The condition necessary 
to make the coefficient of intelligence constant is that the 
spread of the distribution in succeeding ages shall increase 
proportionately. The coefficient of intelligence, unlike the 
intelligence quotient, is not affected by the form of the age 
progress curve, except as this may affect the spread of the 
distribution. The only case in which both the intelligence 
quotient and the coefficient of intelligence can remain con- 
stant is the one in which the age progress curve is straight and 
the spread of the distribution increases regularly and pro- 
portionately. 

The variability in the I.Q. and the C.I. in different tests 
or in different ages of the same test is amply illustrated in a 
paper by Gertrude Rand.! She shows that, in general, the 
variability of I.Q.s increases with age while the variability 
of C.I.s decreases with age. The latter phenomenon is of 
course due to the fact that the variation in scores with in- 
creasing age is not proportional to the increase in the scores. 
The increase in spread of I.Q.s is due to the fact that the 
negative acceleration of the curve of mental growth and 
the increase in the variability of test scores in succeed- 


1Gertrude Rand. “A Discussion of the Quotient Method of Specifying 
Test Results”; in Journal of Educational Psychology, vol. 16, pp. 599-618. 
1925 


SCORES AND NORMS 283 


ing ages combined are not sufficient to keep the I.Q. con- 
stant. Miss Rand’s data also reveal an enormous dif- 
ference in the spread of I.Q.s in different tests. 

The case seems rather paradoxical. First we find that the 
1.Q. is approximately constant in the case of the Stanford- 
Binet Scale. From analysis we conclude that this constancy 
implies that there is a diminishing rate of growth or an in- 
creasing divergence in abilities or both. On the other hand, 
it appears that the I.Q. is not constant for many point scales, 
nor comparable among the various scales. Furthermore, to 
anticipate, we shall find that point scales do not give us the 
conditions which we have found by analysis to be neces- 
sary to give a constant I.Q. The Stanford-Binet seems to 
indicate one type of development and the point scales 
another. 

The solution of the paradox is to be found in the fact that 
the form of mental-growth curves depends not only on the 
fundamental nature of mental development itself, but also 
on the characteristics of the scale which is used to measure it. 
Thus, some scales will show a retardation in mental growth 
at a particular period while others show a uniform rate of 
advancement at the same period. We cannot draw universal 
conclusions from the results of a single scale regarding either 
the applicability of a particular type of score, such as the I.Q., 
or the form of the curve of mental growth. This latter point 
regarding mental growth will be amplified in the chapter on 
that subject. 

Another relative score which looks superficially like the 
1.Q. or the coefficient of intelligence, but which is based on a 
fundamentally different assumption, is the index of bright- 
ness, or the I.B., first used by Otis. The index of brightness 
is found by calculating the difference between the individual’s 
score and the norm for his age, and then, according as this 
difference is plus or minus, adding it to, or subtracting it 
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from 100. Thus if the norm is 90, and the individual score 
is 97, his index of brightness is 7 + 100, or 107. If the in- 
dividual score is 83 the I.B. is 93. It is obvious that a given 
amount of superiority of a score in points, or inferiority in 
points, is given the same significance at various ages by this 
method of calculation. Ten points above the median at age 
six means exactly the same thing as ten points above the me- 
dian at age twelve. This is fundamentally different from the 
principle underlying the coefficient of intelligence. It presup- 
poses, if the I.B. is to be constant, that the spread of the 
distribution in succeeding ages is identical in terms of points, 
and that the curves of age progress of individuals of various 
degrees of capacity are parallel. 

The validity of this measure is not affected by the form of 
the age-progress curve. It is therefore possible that it may 
be consistent with the I.Q. It is improbable that it should 
be so consistent, however, since this would be true only in 
case the constancy of the I.Q. is based wholly upon the form 
of the progress curve and not to any extent upon the relative 
spread in distribution from age to age. 

It is surprising that with all the time and money which 
has been spent upon intelligence tests within the last few 
years, we are not able to say more positively which of the 
assumptions, if any, which are implied in these various 
measures are in accord with the fact. 

The final measure of relative standing which may be 
mentioned is the percentile rank. This is commonly ex- 
pressed as a ten percentile rank, or the rank according to the 
tenth of the distribution in which the individual score falls. 
By this method the possible score on any test would range 
from one to ten, and the scores on all tests which are ranked 
in this fashion are comparable. The first influential use of 
the percentile rank method of scoring, so far as the writer is 
aware, was made by Woolley in her series of tests at Cin- 
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cinnati.! This method of scoring was later adopted by 
Pintner and Paterson in their scale of performance tests, by 
Pintner in his Mental Survey, and by Seashore in his music 
tests, and others. 

The percentile rank has the advantage of simplicity and 
convenience. It has the theoretical defect, however, that it 
assumes the rectangular distribution of abilities instead of 
the normal distribution, to which the distribution of abilities 
in fact more nearly conforms. To illustrate, according to the 
normal distribution, the lowest ten per cent of a group of in- 
dividuals would cover a much wider range of the scale, which 
is represented by the base line of the distribution, than 
would a ten per cent group near the center of the distribu- 
tion. By the rectangular distribution, however, the ten per 
cent at the low end or the high end of the scale would cover 
the same distance as the ten per cent in the middle. As a 
consequence, the percentile method is not suitable for precise 
scoring. 


6. Measures of the relation between intelligence and 
achievement 


Intelligence tests are assumed by theory to measure native 
capacity. Educational tests are assumed to measure the 
actual achievement which the individual makes. This is 
the product of his native capacity and the training which he 
has received, plus certain character traits and general en- 
vironmental influences. It was inevitable that the attempt 
should be made to bring these two measures into relationship 
to one another in order to determine the degree to which the 
individual’s achievement corresponds to his capacity. This 
was first done, so far as the writer is aware, by Buckingham 


1H. T. Woolley. ‘A New Scale of Mental and Physical Measurements 
for Adolescents and Some of its Uses”’; in Journal of Educational Psychology, 
vol. 6, pp. 521-50. 1925. 
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and Monroe, in connection with their Illinois Examination.! 
These authors call the measure of relative achievement the 
“achievement quotient,” or A.Q., and find it by dividing the 
achievement age by the mental age. Perhaps the most 
elaborate use of such a quotient as this has been made by 
Franzen.2. Franzen follows substantially the same proced- 
ure as was followed in the Illinois Examination. He first 
finds the subject ratios of the various individual school sub- 
jects. These are the ratios between the subject ages and the 
mental ages. The average of these subject ratios he calls 
accomplishment ratio (Acc. R.). The accomplishment ratio, 
then, is the same as Buckingham and Monroe’s achievement 
quotient. 

The interpretation of the achievement quotient or the 
accomplishment ratio has not always been clear, and its use 
has sometimes led to interpretations which were absurd. 
For example, the accomplishment ratio is sometimes de- 
scribed as measuring the relation between one’s achievement 
and one’s capacity. According to this definition it would be 
impossible to have an accomplishment ratio above 100, 
since one could not achieve beyond his capacity. We do as 
a matter of fact, however, find a large number of accomplish- 
ment ratios above 100. We escape this particular difficulty 
if we describe the accomplishment ratio as a measure of the 
relation between the individual’s accomplishment age and 
his mental age, and use these terms to represent empirical 
measures instead of assuming that they measure exactly the 
underlying facts of capacity and achievement. Detailed 


1 See the account of this examination in the following bulletins: Walter S. 
Monroe. The Illinois Examination. University of Illinois Bulletin, vol. 19, 
no. 9, 1921. W.S. Monroe and B. R. Buckingham. Teachers’ Handbook 
to the Illinois Examination, I and II, Public School Publishing Company. 

?R. H. Franzen. “The Conservation of Talent”; being Chapter IV of 
Terman, et al., Intelligence Tests and School Reorganization. World Book 
Company, 1922. 
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criticisms have been made of the accomplishment ratio, 
both from the analytical and the statistical point of view, by 
Toops and Symonds and by Chapman.! 

It would carry us beyond the limits of our space to dis- 
cuss in detail all the questions which are raised concerning 
the achievement quotient or the accomplishment ratio. 
We may, however, comment upon a few of the most im- 
portant implications and their practical significance. 

The accomplishment ratio seems to imply, in the first 
place, that the intelligence test score gives a measure of na- 
tive capacity which is independent of training, on the one 
hand, and that the educational test score gives an independ- 
ent measure of achievement, on the other hand. This does 
not mean that intelligence and achievement are unrelated. 
On the contrary, it is usually assumed by those that use the 
accomplishment ratio that they are so closely related thatitis 
possible to make them correspond almost exactly. What is 
meant is that the particular tests which are used to measure 
intelligence are not affected by the accident of training, or of 
other mental traits than intelligence, and that the achieve- 
ment tests will respond delicately to the changes in educa- 
tion or in effort, while the intelligence remains constant. 

The distinction between what is measured by the two 
types of test is not as cleancut and definite as is implied in 
the assumption above mentioned. This means that in our 
reasoning about these measures and their relationship to one 
another we must treat them as rough, empirical measures, 
and not as highly refined measures of independent vari- 
ables. 

1 Herbert A. Toops and T. W. Symonds. ‘‘What Shall We Expect of 
the A.Q. 2?” in Journal of Educational Psychology, vol. 13, pp. 513-28 (1922), 
and vol. 14, pp. 27-38 (1923). 

J. Crosby Chapman. “The Unreliability of the Difference Between 


Intelligence and the Educational Rating”; in Journal of Educational 
Psychology, vol. 14, pp. 103-08. 1923. 
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In the second place, the accomplishment ratio implies 
that, so far as the intellectual factor is concerned, achieve- 
ment is based upon the same capacities as are involved in 
intelligence. It implies, furthermore, that achievement is 
also affected by other factors, such as interest, effort, and 
training. If these other factors, according to the hypothesis, 
are controlled and made equal in their effect upon the score, 
the achievement score will correspond completely to the 
intelligence score. This is an extreme example of the theory 
of general intelligence, which holds that every type of 
achievement is dependent upon the same kind of capacity. 

A consequence of the assumption that achievement is 
always based upon the same kind of intellectual capacity is 
that if everybody did his best, and the factors of training and 
physical and mental environment were the same, the achieve- 
ments of all individuals of the same mental age would be 
identical. 

The prospect of being able to bring the accomplishment 
of every individual into exact harmony with his potential 
achievement is a pleasing one to contemplate, but it prob- 
ably cannot be done with anything like the exactness which. 
is implied in using our present measures in the manner which 
has been indicated. The assumption that the intellectual 
factor in achievement in the various school subjects corre- 
sponds perfectly, and corresponds with general intelligence, 
is very doubtful, to say the least. A further assumption, 
which follows from this, that the variation which we find 
between the relation of the intelligence scores to the achieve- 
ment scores is due solely to non-intellectual factors, such as 
effort, and so on, so that the individuals of a given mental 
age who make high achievement scores may be credited with 
an approach to maximum effort, whereas those who make 
low scores must be considered as making very deficient 
effort is, again, a doubtful assumption. The apparent veri- 
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fication of this assumption by the results of training in 
Franzen’s experiment, which raised the correlation between 
intelligence and achievement, is undoubtedly the result, in 
part at least, of reasoning in a circle; for it is probable that 
pressure was brought to bear upon the individuals in pro- 
portion as their achievement scores fell below their corre- 
sponding intelligence scores. <A very significant fact, par- 
ticularly with reference to certain subjects, as spelling, is that 
individuals with high I.Q.s have lower accomplishment 
ratios than those with low I.Q.s. This may be due partly to 
the fact that individuals of low I.Q. are more strongly stimu- 
lated to achievement because of the fact that they tend to 
fall behind in school. It is due also, however, to the fact 
that when we classify individuals according to their I.Q.s 
and expect them to achieve in proportion, we do a certain 
injustice to the individuals of high I.Q. Their capacity for 
achievement does not correspond completely to their scores 
in the intelligence test. 

It is commonly observed that pupils of a given mental age 
whose I.Q.s. are low havea higher A.Q. than those whose I.Q:s. 
are superior. This is interpreted to mean that pupils of a low 
1.Q. work more nearly up to their capacity because they are 
stimulated more vigorously than their brighter companions. 
This greater stimulation of dull pupils very likely occurs, and 
it may account in part for the fact we are discussing, but it 
is certainly not the sole cause. Pupils of higher I.Q. must 
have a lower A.Q. as a consequence of the fact that the 
ability measured in the intelligence test and the ability re- 
quired for achievement in the various subjects is not identi- 
cal. 

Take the extreme case in which there may be assumed to 
be no correlation between intelligence and achievement in a 
particular subject. We might take five groups of pupils all 
of the same age, representing five different levels in I.Q. or 
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M.A. Because there is, by hypothesis, no correlation be- 
tween intelligence and achieve- 
ment, in this case, the groups 
would be in descending order in 
A.Q., since the higher the I.Q. the 

Equal lower would be the A.Q. This 

average descending order would appear in 

achieve- some measure wherever the cor- 

Bo relation between intelligence and 
capacity in the subjects is less than 

perfect. The assumption which is commonly made is that the 

correlation would be perfect, if it were not for variations in 
effort and other factors apart from general intelligence. It is 
safe to say that this assumption is false. 

To this difficulty is added one which grows out of the 
difference between the numerical significance of E.Q.s and 
I.Q.s. Data have been assembled by Rand! which show 
that the distribution of E.Q.s is narrower than is the dis- 
tribution of I.Q.s. If this difference exists it necessarily 


causes the higher A.Q.s Ge) to be lower and the lower 


Five levels 
of 1.Q., pupils 
of same age 


A.Q.s to be higher than are the corresponding I.Q.s. 

What, then may we conclude concerning the practical 
significance of the achievement quotient, or the accomplish- 
ment ratio? If we do not press the implication of such a 
ratio too far, and if we regard the measures both of intelli- 
gence and of achievement as rough approximate measures, 
the use of the accomplishment ratio will do no harm. 
Roughly speaking and, in general, those individuals who 
have a low accomplishment ratio probably either need stim- 
ulation, or are suffering under some kind of unfavorable 
conditions. An intensive study of such pupils should be 
made in order to discover their temperament and character 

1Gertrude Rand, op. cit. 
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traits, and the possible influence of these traits upon accom- 
plishment, and also in order to discover any accidental 
environmental conditions or habits of life which may affect 
achievement. In making this study of pupils of low ac- 
complishment ratio, however, we must not lose sight of the 
pupils of higher achievement, but we must recognize that 
they also may be doing poorer work than they are capable of 
doing. 


7. Norms 


The word norm may be used in two senses. In the first 
place, it may be taken to mean a standard of comparison 
to which it is implied that the various individuals of a group 
should conform. In the second place, it may mean simply 
the central tendency of the scores of a specified group with- 
out any implications concerning the desirability of indi- 
viduals conforming to it. We shall use the term here in the 
neutral sense of the central tendency of a specified group. 

The norm is inherent in the score of the age scale. This 
means that the standardization of the age scale, and the 
nature of the score in this scale, is of such a character that 
the relationship between the score of the individual and the 
average of the group is apparent in the score itself. We may 
pass immediately then to the discussion of norms in point 
scales. : 

The central tendency, which is taken as the norm, is 
ordinarily either the median or the arithmetical mean of the 
group to which the norms apply. Norms may be classified 
according to the basis which is used for making up the 
groups. 

The most widespread and significant norms are those 
which are based upon age grouping. Age norms consist of 
medians or means of the scores made by children of suc- 
cessive age groups. The chief problem which arises in the 
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determination and interpretation of age norms is the selec- 
tion of the cases. This selection must always be restricted 
to some degree. An age norm can never represent all the 
children of that age in the entire world. It is hardly likely, 
at the present time at least, that we can secure norms which 
represent the children even of the civilized world. The 
widest area which has been covered in any serious attempt 
to secure age norms has been a single country. 

Within any one nation, such as the United States, there 
are numerous groups which may differ from one another in 
intellectual capacity. For example, there are environmental 
groups. Environmental groups would be represented by the 
children belonging to a particular neighborhood in a par- 
ticular community. There are racial groups, sex groups, and 
occupational groups. If our aim is to secure norms which 
shall be representative of all the inhabitants of the country, 
and if we cannot test every individual of a given age, as we 
obviously cannot, it is necessary for a completely repre- 
sentative norm to secure a sample in which the individuals 
of each group were represented in the same proportion as 
they are represented in the population as a whole — assum- 
ing, of course, that there are differences in intelligence be- 
tween these groups. 

It may be said at once that no such systematic method of 
sampling, in order to secure age norms for children, has ever 
been carried out. ‘Two chief methods for the purpose of 
securing an approximation through random sampling have 
been employed. The first one, which was employed by 
Terman in his standardization of his Stanford Revision of 
the Binet scale, was to select a community which might be 
presumed to represent neither extreme of ability, and then 
to test all of the children of that community. The other 
method, which is more commonly employed in securing 
norms for group point scales, is to test as large a number of 
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children as possible, of various races and in various parts of 
the country, and then to assume that the different groups 
will be represented in the total group in the same propor- 
tion as they are in the country as a whole. These methods 
probably secure norms which fairly well approximate norms 
to be secured by a more systematic method of sampling. 

Tests have practically all been standardized upon school 
children, and norms have been secured from children in the 
school. This results in the limitation of the norms to those 
ages at which practically all of the children are in school 
and can be tested. Satisfactory age norms have usually 
been limited to the ages seven to thirteen. Below age seven 
some children are not yet in school, and beyond age thirteen 
some of the children, the brighter ones, of course, have gone 
on to high school and are not usually included in the testing 
program. Ifthe tests are extended to the high school there 
begins to be an elimination of the duller pupils beyond the 
age of fourteen. Norms for the ages fifteen and above are 
therefore less representative than for the ages below. 

Age norms have been criticized on the ground of the 
alleged fact that stages in intellectual development are not 
well represented by chronological age. It is a well-estab- 
lished fact that individual children differ widely in the rate 
at which they mature physiologically, and that children of a 
given chronological age represent rather widely different 
stages of physiological maturity. It is believed, further, 
that the rate of intellectual maturing corresponds more 
closely to physiological maturity than it does to chronologi- 
cal age, and that if we can find a convenient measure of 
physiological maturity it is desirable to substitute an index 
of physiological maturity for chronological age in establish- 
ing norms and in comparing the scores of individuals with 
the norms. 

It is not certain, however, that intellectual maturity cor- 
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responds more closely to physiological maturity than the 
chronological age, although it would seem natural that it 
should do so. In a study by T. M. Carter, the partial cor- 
relation, with age constant, was calculated between mental 
age and the ratio of ossification of the bones of the wrist. 
This ratio of ossification is the best measure that we have up 
to the present of physiological maturity... The correlation 
was found to be practically zero. If we take the mental age 
of an individual to be determined both by his intelligence 
and by the degree of his maturity, and if intelligence is not 
related to the rate of maturing, then mental age should be 
correlated with the measure of the stage of physiological 
maturity to the extent that the degree of mental maturity 
is represented by mental age, on the one hand, and is re- 
lated to physiological maturity, on the other hand. Since 
we find no correlation we must conclude either that there is 
not a significant difference in the rate of intellectual maturing 
or that the rate of intellectual maturing does not correspond 
closely to the rate of physiological maturing. Carter found, 
further, that there was a closer correlation between mental 
age and chronological age than between mental age and ratio 
of ossification. There is not sufficient evidence to warrant the 
substitution, at the present time, of a physiological measure 
for chronological age in calculating mental age norms. 


8. Grade norms 
Grade norms have been used less frequently with intelli- 


1 Thomas M. Carter. A Study of Radiographs of the Bones of the Wrist as a 
Means of Determining Anatomical Age. A dissertation submitted in 
candidacy for the degree of Doctor of Philosophy, University of Chicago 
Library, 1923. 

See also Frank N. Freeman and T. M. Carter. ‘A New Measure of the 
Development of the Carpal Bones and its Relation to Mental and Physical 
Development”; in Journal of Educational Psychology, vol. 15, pp. 257-70. 
1924, 
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gence tests than with educational tests. Their interpreta- 
tion is much more ambiguous than is the interpretation of 
age norms. ‘This is due to the fact that the age composition 
of a grade in one school system may be very different from 
the age composition in another system. The amount of re- 
tardation or acceleration differs greatly from one commu- 
nity to another, and the age at entering school may also differ. 
If the pupils of a grade in one city have the same average 
intelligence scores as the pupils of the same grade in another 
city, it might be due to the fact that they possess the same 
intelligence, or it might be due to the fact that pupils in one 
community had a higher intelligence and were also farther 
advanced in the school. Furthermore, even with pupils in 
a given community, and with a given average intelligence, 
it might be possible to change the composition of a grade 
without changing the average intelligence score. That is, 
dull pupils might be eliminated from the grade and bright 
pupils added to it. To put it in another way, the com- 
position of a grade is determined by the promotion policy 
which is in force. 

If both age and grade norms are furnished with a test, and 
if the school administrator compares the scores made by the 
children of his system with both norms, he is likely to meet a 
situation which seems, at first sight at least, to be anoma- 
lous. He may find that the majority of the children are up to 
the age norms, but that a large majority make scores inferior 
to the grade norms. This has been found to be true, for 
example, in the use of the Haggerty scale, Delta 2. This 
situation is difficult to interpret. In order to interpret it, 
it is necessary that one have at his command all the facts 
concerning the grade progress of the children whose scores 
furnish the basis for the norms, and also of the children in 
the system which is being tested. These facts are never at 
hand. It seems, therefore, that grade norms for intelligence 
tests are of little practical value. 
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9. Norms for sex, race, and for social groups 

In the discussion of age norms it was assumed that com- 
posite scores should be secured from all of the groups com- 
posing a community. It is sometimes held, however, that 
separate norms should be found for various groups, and that 
individuals of these groups should be judged each by com- 
parison with norms for his particular group. 

In considering the desirability of such group norms we 
must raise two questions. There is first the question of 
fact as to whether there exist sufficient differences between 
groups to make the norms desirable. If no significant 
differences between groups are found, then, of course, sepa- 
rate norms would have no meaning. If significant differ- 
ences are found, then we are faced with a different question. 
What is the purpose of norms, and will this purpose be 
better served by differential group norms, or by composite 
norms? We may consider these questions individually with 
reference to the three types of group norms which have been 
composed. 

Sex norms. The prevailing view at the present time is that 
sex differences in intelligence tests are so small as to make it 
unnecessary to calculate separate norms for boys and girls. 
In fact, separate averages for the two sexes are not furnished 
with the majority of intelligence tests. Yerkes, Bridges, 
and Hardwick, in their original report on the point scale, 
stated that sex differences were large enough to demand 
separate norms. In the revised edition by Yerkes and 
Foster, however, the following statement is made: (p. 87) 
“On the basis of total score for the entire scale no significant 
sex differences can be made out from the original point scale 
results, but there seem to be sex differences in the ease with 
which certain of the individual tests are passed.” In Ter- 
man’s report upon the Stanford Revision of the Binet scale,! 

1 Stanford Revision of the Binet-Simon Scale, chap. tv. 
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he writes that there are slight differences in favor of the girls 
up to age thirteen. These differences, however, amount to 
only from two to four per cent and he does not consider them 
sufficient to warrant separate norms. Woodrow, from a 
study of a small group, concludes that the girls are superior 
to the boys, but that this superiority is not as great as we 
should expect from their comparative precocity in physio- 
logical development. He calculates, therefore, that girls in 
reality are inferior to the boys.! In view of the-facts which 
were mentioned above concerning the relation between 
physiological maturity and mental maturity, this conclusion 
is, to say the least, a hazardous one. The best evidence 
which is now available indicates that sex differences in 
general intellectual capacity are negligible so far as the con- 
struction of norms is concerned. 

Race norms. The existence of race differences in intellect- 
ual capacity will be discussed at some length in a later 
chapter. We may anticipate the conclusion of that dis- 
cussion so far as to say that there appear to be significant 
differences between certain races in the capacity which is 
measured by our general intelligence tests. Whatever may 
be the ultimate explanation of these differences, they do 
now exist as a matter of objective fact. The largest differ- 
ences of which we now have evidence are between the negroes 
and the Indians on the one hand, and the whites taken as a 
group, on the other hand. 

Granting that these differences exist, does it follow that 
we should have separate norms? ‘This raises the question 
concerning the purpose of the norms. Those who favor 
racial norms would say that the purpose of norms is to de- 
termine which individuals are normal and which, in distinc- 
tion from them, are above normal by various degrees, or 
below normal by various degrees. They would add that 
pe Brightness and Dullness in Children, p. 121. 
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what is normal for one race is different from what is normal 
from another race. To apply to a race a standard which would 
result in rating a large majority of the individuals subnormal 
would be to contradict the meaning of normal. Normal, ac- 
cording to this view, is that which is usual, and therefore the 
majority of individuals of any group must be rated as normal. 

If we base our decision upon this rather formal definition 
of the normal, we still have the alternative of considering 
a particular racial group as a distinct unit, or as a part of 
a composite group, which is made up of all the inhabitants 
of a given community. The treatment of a racial group as 
separate and distinct does not grow out of the necessities 
of the case, but must be justified by showing that such 
treatment gives ratings which are of greater practical use- 
fulness than are obtained from composite ratings. 

Take an illustration. The application to inferior racial 
groups of composite norms results in classifying a larger 
number as feeble-minded than would be so classified by the 
application of racial norms. Are the individuals who are 
thus classified as feeble-minded comparable to the smaller 
number of the superior race classified as feeble-minded, and 
do they demand the same treatment? The same question 
could be applied to the classification of the individuals at 
the upper end of the intellectual scale. 

The prevailing view would probably be that so far as 
measurement by any absolute standard is concerned, the 
larger number of the inferior group which is rated feeble- 
minded by a test is comparable to the smaller number of 
the superior group which is so rated. Feeble-mindedness, 
in other words, represents a certain degree of capacity, and 
not the existence of a trait which is absent from normal 
individuals, nor the absence of a trait which is present in 
normal individuals. At any rate, this is true of degrees of 
deficiency above that of feeble-mindedness. 
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On the other hand, the kind of rating which should be 
given an individual depends upon the significance which 
that rating has with reference to his capacity to adjust 
himself in his environment. If an individual of an inferior 
race intellectually comes into contact and competition 
wholly or chiefly with other individuals of the same race, 
successful adjustment demands a lesser degree of ability 
than if he comes into contact and competition with indi- 
viduals of a superior race. If the individuals of a race, then, 
are largely segregated in their social, industrial, and com- 
mercial life, it would seem preferable to apply to them 
norms which have been derived from their own group. If, 
however, they are mixed with individuals of another race in 
their social and vocational activities, they should be rated 
by composite norms. It may be that we should apply com- 
posite norms when tests are used for some purposes and 
separate norms when they are used for other purposes. 
In any case the issue is one which should be decided prag- 
matically rather than on the grounds of a formal definition 
of normality. 

Even if we grant the desirability of having race norms for 
certain purposes, there are two difficulties in the way of 
securing usable norms. The first difficulty arises out of the 
fact that‘the standing in mental tests is affected by the 
social environment of the individual as well as by his race. 
Segregated race groups differ in social environment. Their 
scores are therefore due to the compound result of race and 
environment and it is impossible to disentangle the share 
which is contributed by race from that which is contributed 
by surroundings. The only method by which an approxi- 
mation to comparable race norms may be secured is to 
obtain scores from the different races which live in the same 
social environment. This will enable us to eliminate differ- 
ences which are due to gross external circumstances. They 
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may not, however, eliminate differences due to general 
cultural background. 

The second difficulty is that of racial mixture. A racial 
norm would apply only to those of pure blood. In an experi- 
ment with the army test,! the scores of a group of mulattoes 
of lighter skin were compared with another group of darker 
skin. In the Army Alpha the lighter-skinned group made a 
median score of 50 and the darker-skinned group a median 
score of 30. Furthermore, the percentage of darker negroes 
was greater among the illiterates than among the literates. 
Garth found similarly that Indians of mixed blood made 
higher scores than Indians of pure blood. These facts indi- 
cate that racial norms which are adapted to those of pure 
blood would not apply to those of mixed blood. Since it would 
be very difficult either to secure norms for those of mixed 
blood or to determine the degrees of mixture in the case of 
individuals, the application of norms to those of different 
races becomes one of large practical difficulty. 

Social norms. There are unquestionably large differences 
between the average scores of various social groups. This is 
true whether we compare those who live in various neighbor- 
hoods in the city, or whether we compare the city with the 
small town or the rural district, or whether we compare 
different sections of the entire country. Some have con- 
tended that the existence of these differences demands norms 
for social groups. This involves questions similar to those 
which are raised in discussing racial norms. 

In the first place, we must consider the purpose for which 
norms are created. They constitute standards by which 
individuals may be compared with one another through the 
medium of the standard. The question whether we should 
have norms for different social groups reduces itself then, to 
this question, Do we wish to compare directly only individ- 

1See page 735 of the Army Report. 
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uals of a given social group, or do we wish to compare them 
directly with individuals of another group? The individuals 
of the various social groups do come into competition with 
one another to a greater extent than do individuals of differ- 
ent races, at least in the case of the negroes and the Indians. 
The assumption of our social organization is that the op- 
portunity for free intermingling and competition is a com- 
plete one. If this is the case it would seem to lead to the 
conclusion that we ought to have norms which can be applied 
alike to all. 

In the case of social level norms, however, there is another 
question involved. The differences between the various 
social groups may be held to be due not to inherent differ- 
ences, but to accidental differences of training and environ- 
ment. In so far as this is the case it may legitimately be held 
that we cannot get at the individual’s real native capacity 
by his raw score. We must make allowance for his training. 

The use of social norms, however, would ascribe the entire 
difference between social groups to the differences of their 
environment. Most psychologists would regard this allow- 
ance as too great. They would hold that segregation into a 
social group is to some extent a selective process based upon 
intelligence, and that, therefore, there is a real native differ- 
ence between such groups. They would hold furthermore, 
that there is an interaction between the effect of native capac- 
ity and environment. The poor environment is unfavorable 
to the development of native capacity, while, on the other 
hand, intelligence more or less creates its own environment 
by the fact that individuals of meager intellect allow their 
environment to deteriorate, while those of higher capacity 
improve their environment. The two factors are therefore 
so entangled that it becomes almost impossible to determine 
how much each is responsible for the group differences which 
we find. Environmental norms, therefore, would be based 
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in part upon an error in assumption, and it would be im- 
possible to determine how great that error was. 

The construction of social norms is attended with the 
further difficulty, analogous to that which was mentioned in 
connection with race norms. It would be exceedingly diffi- 
cult to find a method of grading social environments so as to 
apply norms to them. Furthermore, many gradations of 
environment could be found and the same individual is 
subject to the influence of more than one environment. For 
example, his home environment may be of one sort and his 
school environment of another. These complications and 
difficulties seem to make it inadvisable to create norms for 
social groups. 


10. The use of local norms 


Because of the difficulties and complications in the inter- 
pretation of norms which are based upon large-scale testing, 
it is frequently more serviceable to use the average of the 
group which is being tested as a provisional norm, rather 
than to use the general norms which have been derived for 
the general use of the test. The greater number of practical 
uses to which tests are put demands simply that individuals 
of a group be rated in comparison with one another. This 
comparative rating is more easily done if the norm which is 
used agrees with the average standing of the group. This is 
not likely to occur when general norms are used. The de- 
sirability of using local averages rather than general norms 
is particularly great in those cases in which the general norms 
have not been established on a large number of cases se- 
lected at random. Only in the case of the most thoroughly 
and carefully standardized test are the general norms to be 
relied upon. 

In cases in which an individual’s general intelligence is to 
be estimated, one may use such general norms. It is usually 
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advisable, however, to base such a rating upon the scores of 
several tests rather than of one test alone. The variations 
which are frequently found in the intelligence rating of an 
individual by different scales makes this precaution neces- 
sary. When it is desired only to get a comparative rating of 
the individuals of a group for classification or for other 
similar purposes, the use of local norms is to be advised. 


CHAPTER XII 
HOW TO TABULATE THE RESULTS OF TESTS 


Tue purpose of the present chapter is to indicate and illus- 
trate the steps which one should go through in tabulating 
the results of tests, so that one may arrive at their interpre- 
tation. The purpose is not to make the reader familiar with 
statistical methods. It is not to describe how one proceeds 
in calculating an average or a median, or a probable error, or 
a coefficient of correlation. For information concerning 
these matters the reader is referred to books on statistics. 
The information here given supplements that which is given 
in books on statistics, but is not a substitute for it. 


1. Tabulating the scores 


In choosing an example to illustrate the steps to be taken 
in tabulating the scores, a class the size of the ordinary 
public school class is taken as a unit. The group which is 
used consists of fifty pupils. A larger group would of course 
give measures which are statistically more reliable. The 
individual teacher, however, frequently has occasion to 
tabulate the scores of a single class, and the procedure which 
is appropriate for a larger group is not always appropriate 
for a group of this size. On the other hand, everything which 
may be legitimately done with a group of fifty may also 
be done with a larger group. Furthermore, the statistical 
measures — the median, the quartile deviation, the correla- 
tion coefficient — are reliable enough with a group of this 
size to have practical meaning. 

In order to show each step from the beginning, we shall 
start out with the original table of scores. Table XII shows 
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the scores of each child upon all of the tests which are to be 
brought into the comparison. In this particular case we 
have the age, the I.Q. on the Stanford Revision of the 
Binet test, the I.Q. on the Otis test, the Otis score, the score 
on the Haggerty test Delta 2, the score on the Gray Oral 
Reading test, the score on the Burgess Silent Reading test, 
and the score on an Arithmetic test. By including certain 
subject-matter or educational tests we shall be able to show 
applications of intelligence tests that we could not otherwise 
illustrate. The numbers in the first column of the table 
represent the individual children of the class. Each hori- 
zontal row of scores, then, was made by one particular child. 

It is obvious that we cannot make general comparisons or 
draw conclusions from large numbers of individual scores 
when they are merely tabulated in this form. They are too 
numerous for us to summarize them by inspection. It is 
therefore necessary to calculate summary scores. We have 
therefore found the average score of the girls as a group, of 
the boys as a group, and of the entire class, for each of the 
tests. Thus we see that the average I.Q. of the girls is 
117.11, of the boys, 124.15, and of the entire class, 118.9. 
In this particular class the boys have a higher I.Q. than 
the girls. 

It will be noticed that the average for the entire class is not 
the average of the two averages for the boys and girls sepa- 
rately. The reason for this is that there are more girls than 
there are boys, and therefore the scores of the girls as a group 
have greater weight in the average of the entire class than do 
the scores of the boys. In order to find the average of the 
entire group, it is necessary to take the total score for the 
whole group and divide by the total number of cases. Only 
in the case that the two sub-groups have the same number 
of cases is it legitimate to average their averages. 

Now that our attention has been drawn to the difference 
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Tasie XII. Inprvipvat Scorzs or A Group or Firry CHILDREN 
on A NuMBER oF TESTS 


GIRLS 
1.Q. 
OTIS 
1. 119 
2. 111 
3. 107 
A. 115 
5. 117 
6. 126 
Ne 103 
8. 120 
9. 121 
113 126 
113 120 
105 
133 ihe 
143 129 
125 125 
130 121 
118 130 
108 117 
114 110 
123 TLE 
116 113 
126 120 
109 116 
107 109 
116 117 
126 125 
136 125 
100 90 
117 
106 92 
119 124 
104 105 
128 120 
112 111 
112 
117 
112 122 
Bee ies 4331 | 3850 
Average.... |117. 1/116.66 


7 
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TasiE XII. Inprvipvuat Scores or a Group or Firry Cutt- 
DREN IN A NuMBER oF TEsts (continued) 


BOYS 


Hac- 
GERTY 


BurGEss 


AcErn} 1.Q. | 1.Q. | Orts 
1923 | Binet | Oris | Score 


38. 10 126 132 60 
39. 10 123 126 52 
40. 12 139 118 55 
41. 11 121 118 48 
42. 9 147 125 46 
43. 11 112 118 47 
44. 11 137 120 58 
45. 10 114 113 39 
46. 10 197 111 37 
47. nal 114 
48. 13 124 107 52 
49. il 116 
50. 11 114 110 43 


Seon 537 


Average.... |124.15| 118 | 48.82 


Total Girls. |4331 |3850 | 1542 4124 | 2092.5 | 2836 1990 
Total Boys |1614 |1298 | 537 1607 | 681.25 996 622 


Grand 
Total.... |5945 |5148 |2079 5731 2773.75) 3832 2612 


Average.... 47.25 | 116.95} 59.01 | 78.20 | 60.74 


between the boys and girls, let us examine the other aver- 
ages, and see whether they give results consistent with the 
averages of the I.Q.s. The Otis I.Q. again gives higher 
averages for the boys, as does the Otis score. It does not 
follow, of course, that because the boys have a superior 
average Otis score, they should also have a superior average 
Otis I.Q. If they were sufficiently older than the girls the 
average score might be higher and the I.Q. lower. Again, 
we find that the boys make a higher score on the Haggerty 
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test. Thus, on all the intelligence tests the score of the boys 
is above that of the girls. When we examine the educational 
tests, on the other hand, the girls are superior. 

The explanation of the difference between boys and girls in 
the relation between capacity and achievement is not the 
chief problem before us. It may be said that this is net an 
isolated finding, and that the chief suggestion toward an 
explanation is that either the school work is more suited to 
the interest of the girls than of the boys, or that girls are 
more conscientious and studious. 

While the average gives us a number which is readily 
grasped and which is convenient for making comparisons, 
it does not tell us all that we need to know about the scores 
of a group. The entire list of individual scores is too com- 
plex to grasp, but, on the other hand, the single summary 
figure which is represented in the average is too much sim- 
plified to give us all the information we need. It does not 
tell us, for example, whether the scores cover a wide range or 
a narrow range, or whether the largest number of scores fall 
in one part of the range or in another part. 


2. The distribution table 

In order to secure more information about the distribution 
of the scores than is given in the average we may tabulate 
them in such a way as to show the number of individuals 
who make the various scores. <A table in which the scores 
are classified in this way is called a distribution table. The 
distributions of the scores given in our basic table are shown 
in Table XIII and Table XIV. The meaning of these tables 
may be gained from the distribution of the Binet I.Q.s 
shown in Table XIII. This distribution shows that there 
was one child whose I.Q. was in the class 145-149, another 
whose score was in the class from 140-144, three whose I.Q.s 
were in the class 135-139, and so on. 
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Tasie XIII. Distrieurtion or 1.Q.s 


Binet ScaLe Oris ScaLE 
Cass INTERVALS 
ieee ee eee 


NEA Oa itis, = ms. 1s 1 


TAA epee a a a0 


SS ms gers os ote 


TELUS SE oie Gente 


1S Oe eae oloes 


PPO IAR Me ae nhs 


) 1G ot Reh eens orate 


TO SA erae fox sire: « 


OS LOOM = fie ieee 


LOOT LO ee chars ar recs: 


Oe evice ace 


Median... 


Let us go back for a moment and consider the steps which 
are gone through in constructing such a distribution table. 
It will be noticed that the scores are classified or grouped. 
The table does not show how many individuals make each 
particular score, but rather how many make scores which 
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Taste XIV. DistTrRIBUTION OF THE SCORES IN Five TEsts 


BuRGESS 
READING 
Trst 


ARITHMETIC 
TrEst 


Haceerty | Gray REApD- 
TEstT ine TEstT 


Otis TEST 


Class In- 
tervals 


145-149 


140-144 


185-189 


130-134 


125-129 


3 | 120-124 


2 | 115-119 


110-114 


105-109 


100-104 


95— 99 


90— 94 


fall within a particular class. The range of scores which was 
used in this table in making up the classes is five points. 
If the frequency of each individual score had been tabulated 
the scattering would have been too great to enable us to 
determine where the greatest frequency lies. We have 
therefore grouped the scores into classes and have chosen 


TABULATING TEST RESULTS 3ul 


class intervals that will give from ten to fifteen classes 
or groups. With a small number of cases the number of class 
intervals should be fewer than with a large number of cases. 

The first step in making a distribution table, then, is to 
determine what our class intervals shall be. We may do 
this by finding the highest and the lowest score and the 
difference between them. This will give us the entire range 
of the scores. We may then provisionally divide this entire 
range by ten, or some larger number if we have a large num- 
ber of cases. This will give us the range of each class 
interval. For example, in the case of the Binet I.Q.s the 
lowest score is 100 and the highest 147. The difference, 47, 
divided by 10 gives us 4.7. Since 5 is the nearest whole 
number to 4.7, and since 5 is a convenient class interval, we 
select this for our range. 

The next step is the very simple one of finding the number 
of scores in each class interval by the method of tallying. 
The form of the tally record is illustrated in the second col- 
umn of Fig. 15, page 315. By adding the tallies of each 
class we have our distribution table. 

From the distribution table it is easy to calculate the me- 
dian. The medians in most cases are not very different from 
the arithmetic mean, which was the average used in the basic 
table. Since the median is easy to calculate, it may be 
used in place of the arithmetic mean where we wish to obtain 
an approximate average for the distribution. 

We may learn the characteristics of a distribution by 
inspecting the distribution table. They are brought out 
more clearly, however, by means of a chart. The most 
commonly used type of graph of a distribution is the column 
diagram, or histogram. The histogram of the Binet I.Q.s 
shown in Table XIII is given in Fig. 14. 

This histogram shows at a glance that the distribution is 
skewed. The upper part of the distribution conforms fairly 
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closely to what is called the normal distribution frequency. 
The lower part, however, has the appearance of being cut 
off rather abruptly. There are no I.Q.s below 100. In an 


10 


9 
8 


if 


100- 105- 110- 115- 120- 125- 130- 135- 140- 145- 
104 109 114 119 124 129 184 189 144 149 


Fia. 14. Hisrogram or Brvet I.Q.s 


unselected group, of course, there would be as many below as 
above 100. It would appear, therefore, that the children 
of this class are not representative of the population as a 
whole, but represent only the upper half of the population. 
We need not here discuss the question whether the evident 
superiority of these pupils is due wholly to innate capacity, 
or in part to training. 

The form of the distribution throws light upon two mat- 
ters, first, the selection of the cases, and second, the suita- 
bility of the test. Take first the selection of the cases. Nu- 
merous distributions of scores have convinced psychologists 
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that abilities are distributed, at least approximately, accord- 
ing to the normal frequency distribution, provided there is 
a random sampling of individuals. We have a random 
sampling when the frequency of the cases representing 
different degrees of ability are the same as the frequency of 
corresponding abilities in the population as a whole. The 
most prominent characteristic of the normal distribution is 
that the largest number of cases occur in the middle and that 
the two sides of the curve of distribution are symmetrical. 

If the sampling is not random, the curve of distribution is 
likely to be unsymmetrical. An unsymmetrical curve, 
however, may also be an indication of the inadequacy of 
the test. If the test is too hard for the group, the largest 
number of scores will fall toward the bottom of the scale, 
and the distribution curve will be skewed toward the upper 
part of the scale. If the test is too easy, the largest number 
will fall toward the upper part of the scale and the curve will 
be skewed toward the bottom. We cannot be sure, there- 
fore, from the form of the curve, what the cause of the skew- 
ness is. A skewed curve should lead us to pursue our in- 
vestigation until we arrive at its explanation. 


3. The percentile curve 


Another useful form of graphic representation of a dis- 
tribution is a percentile curve. The advantages of a per- 
centile curve are thus set forth by Otis.! “A percentile curve 
shows at a glance not only the median score of a class, but 
also the range and variability of the scores. It shows at a 
glance just what per cent of the scores of a class is exceeded 
by the score of any given indivudual, and just what per cent 
of the class attains or exceeds any given score. Two or more 
curves on the same graph show very vividly the amount of 
overlapping of the scores of different classes.”’ 


1 Arthur S. Otis. Manual of Directions and Key to the Otis Self-Admainister- 
ing Test of Mental Ability, p. 10. Werld Book Company, 1922. 
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To get an understanding of the percentile curve, let us go 
through the procedure by which it is made up. (See Fig. 
15.) We start from the distribution table as before, and in 
this illustration we use the distribution of the Binet I.Q:s. 
The first column at the left of the chart shows the class in- 
tervals of the scores. The next column shows the first step 
in making a distribution table. It contains the tallies of the 
scores which fall within the various class intervals. In the 
next column we depart from the distribution table. Instead 
of writing in the number of cases in each class interval, we 
write in each space the total number of cases in that class 
interval plus all of those in the lower intervals. The figures 
then represent the cumulative frequency. In the next 
column these cumulative frequencies are translated into 
terms of the percentage of the total number of cases. 

We are now ready to construct the graph. The scale along 
the bottom of the chart represents the percentage of cases. 
The vertical scale constructed in the middle of the chart 
represents the scores. ‘The significance of the curve in 
general is this. Each point on the curve represents the per- 
centage of the group which makes a given score or lower. 
Thus, in this particular case, 20 per cent of the children 
make a score of 110, or lower; 40 per cent make 115 or lower; 
and 90 per cent make a score of 135 or lower. 

Before commenting further upon the facts which are shown 
by the chart, let us go back for a moment and trace the 
steps in constructing the curve. The procedure in brief is 
as follows. Place a point at the lower limit of the first class 
interval and at the zero point on the horizontal scale. 
Second, place a point at the upper limit of the first class 
interval and at the place on the horizontal scale which repre- 
sents the percentage of cases in this interval. Third, place 
a point at the upper border of the second class interval, at 
the place on the horizontal scale representing the percentage 
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in the second group. Do the same for each of the other 
class intervals. The final step is to draw a smooth curve 
through the points which have been plotted. 

The use of the percentile curve may be further illustrated. 
The median, of course, is the fifty percentile. This is found 
by noting on the vertical scale where the percentile curve 
crosses the 50 per cent line. It will be seen that this is 117.5. 
It agrees with our calculation of the median. An impor- 
tant measure of the characteristic of a distribution is the 
measure of its variability. A commonly used measure of 
variability is very easily found from the percentile curve. 
This is Q, which represents half the difference between the 
75 percentile and the 25 percentile score, or the difference 
between Qs, and Q, . Q, is found by locating the point on the 
curve above 25 on the horizontal scale, and Qs; by locating 
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the point above 75 on the horizontal scale. These are in- 
dicated by the short cross-lines on the curve. In the present 

; 126—111.3 ; 
case, Q, is 111.3, Qs is 126, and Q = Boame ens 7.4. Twice 
7.4 or 14.7 is the range which includes the middle half of the 
scores of the group. This is a convenient measure by which 
to compare various distributions with one another. 

We cannot, of course, compare directly the semi-inter- 
quartile range or Q in order to get the relative variability of 
two distributions, unless the same tests were used in the 
two cases. For the procedure to be followed when the scale 
or test is different, the reader is referred to a book on 
statistics. 

The percentile graph is useful, finally, as a means of 
giving a convenient measure of the position of the individual 
in the distribution and of comparing the individual’s position 
in different distributions. Suppose that, in the present case, 
a pupil’s I.Q. was 115. This would mean that his percentile 
rank was 40. In other words his I.Q. exceeds that of the 
lower 40 per cent of the group and is exceeded by the upper 
60 per cent. If now, we wish to determine whether the 
pupil’s score in arithmetic is higher or lower, relatively to 
that of the entire class, than is his I.Q., we can do so by 
finding his percentile rank in arithmetic and comparing it 
with his percentile rank in I.Q. 


4. Correlation 


The comparison just suggested between a pupil’s per- 
centile rank in two tests is a rough means of finding the cor- 
relation in an individual case. The next procedure is to 
tabulate the scores on pairs of tests, so as to bring out the 
relationship between the scores for the group as a whole. 
Take as an illustration the correlation table representing the 
relation between the Binet I.Q. and the Otis 1.Q., Table XV. 
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TaBLE XV. CorreELATION TABLE SHOWING THE RELATION 
BETWEEN Binet I.Q.s anp Ors I.Q.s 


Ortts ScALE 


100— | 105— | 110- 
104 | 109 


145-149 


140-144 


135-139 


130-134 


125-129 


120-124 


115-119 


110-114 


105-109 


Cy am!] oy | a] om | oo | 


100-104 


o 
= 


fLotale< 2 


7 =.50 + .076 


A correlation table is really a simultaneous distribution 
of the scores of one test in one dimension and scores of the 
other test in the other dimension. In this particular case, 
the distribution of scores in the Binet test is represented 
vertically and the distribution of the scores in the Otis test 
horizontally. The total distribution of Binet I.Q.s is shown 
in the last column to the right and the distribution of the Otis 
1.Q.s in the lowest horizontal row. 

Consider the make-up of the table from the point of view 
of individual cases. In the lower left-hand corner is the 
figure 1. This means that one child had an I.Q. on the 
Binet Scale within the range 100-104, and an I.Q. on the Otis 
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scale within the range 90-94. Directly above him is repre- 
sented a child whose Binet I.Q. is in the class 105-109 and 
whose Otis I.Q. is in the class 90-94. In the upper hori- 
zontal row, we find a tally representing a child whose Binet 
LQ. is in the class 145-149, and whose Otis I.Q. is in the 
class 125-129. In these cases, the I.Q.s in the two tests 
correspond fairly well. A low L.Q. in the one test goes with 
the low I.Q. in the other, or a high I.Q. in the one test goes 
with a high I.Q. in the other. 

In other cases, however, the correspondence is not so 
close. For example, one child has a Binet I.Q. which places 
him in the lowest section of the scale, but an Otis I.Q. in the 
class 115-119. It is easy to locate roughly those cases in 
which the scores on the two tests correspond and those on 
which they differ. All the cases which cluster about a diag- 
onal line running from the lower left-hand to the upper right- 
hand corner are cases in which the two scores correspond. 
Those which fall in the upper left-hand or the lower right- 
hand corner of the table are cases in which there is a dis- 
‘crepancy. In the present comparison there are no cases of 
children who have a high I.Q. in the Binet test and a low 
I.Q. in the Otis, but there are a number of cases of 
children who have a comparatively high I.Q. in the Otis, 
but a low I.Q. in the Binet. It appears that the qualities 
which enable a child to do well in the Binet test also enable 
him to do well in the Otis test, but that there are certain 
qualities which make possible.a comparatively high score 
on the Otis test, but which are not adequate to give a high 
score on the Binet test. What these qualities are would 
require further analysis. It may be that, since a group test 
is more largely a measure of speed than an individual test, 
rapidity of performance is the quality which gives a high 
score in the Otis test, but not in the Binet. This question 
illustrates the way in which a correlation table may be used 
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to make a further analysis of the scores than can be made 
from the distribution of the scores in the individual tests 
alone. 

Another way in which a correlation table showing the dis- 
tribution of the scores in two intelligence tests may be used 
is to discover the cases of children on whom one or the other 
of the tests appears to be unreliable. It is quite possible 
that, in any particular test, a child may do himself in- 
justice due to the circumstances of the moment. If we find 
that a child makes a low score in one test and a high score 
in another, we should follow the matter up by giving him a 
third test, in this way determining more nearly what his 
true rank is. It is a common practice to give an individual 
test to a child when his scores on the two group tests show 
wide discrepancy. 

For practical administrative uses the detailed correlation 
table gives most of the information on correlation which we 
need. In order to determine the amount of correlation 
between two tests so that it may be compared with the 
amount of correlation between other tests, it is necessary to 
express the correlation in terms of a single coefficient. This 
coefficient is derived by the use of one of the formule which 
are now available and which can be found in books on 
statistics. It will be remembered that the range of the co- 
efficients is from —1, which expresses complete negative cor- 
relation, through 0, which expresses no correlation whatever, 
to +1 which expresses perfect positive correlation. The 
correlation coefficient has been calculated from Table XV, 
and found to be .50 + .076, as stated at the bottom of the 
table. This is commonly regarded as a rather low correla- 
tion between intelligence tests. We usually expect the 
correlation coefficient between tests of the same nature to 
be .70 or higher. ' A possible explanation of the low correla- 
tion in this case is that the range of the distribution of I.Q.s 
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is narrow. In the case of the Binet test there is no I.Q. 
below 100. It will be remembered that the distribution is a 
skewed one, suggesting that the lower part has been cut off 
by some process of selection. Furthermore, the I.Q.s on 
the Otis tests are also nearly all above 100. This means 
that the pupils of this class are more nearly homogeneous in 
intelligence than are the pupils of an unselected group. The 
correlations between tests of a homogeneous group are 
always lower than the correlations in the case of a group of 
more widely scattered abilities. 

Table XVI shows the correlation between the Otis score 
and the Haggerty score. This correlation is still lower than 
that. between the Binet I.Q. and the Otis LQ. We might 
expect that there would be a higher correlation between two 
group tests than between a group test and an individual 
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test. No explanation is suggested for this lower score, 
unless it be that the Haggerty test is less reliable in this 
particular case than is the Otis test. It is more probable, 
however, that the difference is an accidental one. 

It will be noticed that, in the preceding tables, the I.Q.s 
were correlated with I.Q.s and the raw scores with the raw 
scores. It is not legitimate, in general, to eorrelate I.Q.s 
with raw scores or with mental ages. The only case in 
which this would be legitimate is the one in which all the 
pupils are the same age. In such a case, of course, the I.Q. 
and the mental age are comparable. The reason that we 
cannot correlate the I.Q. with mental age or with raw score, 
is that I.Q. expresses the relative standing or brightness of 
the pupil, and remains the same from age to age, whereas 
mental age or raw score represents the attainment of the 
pupil on fixed scale. Thus, suppose that a very bright 
child was in a class with other children who were older than 
himself. He would stand high in I.Q. but would stand low, 
or at least have a medium rank in mental age. There 
would thus appear to be a discrepancy between his intel- 
ligence quotient and his mental age. ‘This discrepancy 
would not appear if he were ranked in terms of I.Q. in two 
tests, since in both cases his standing would be high. A 
discrepancy would not appear, furthermore, if he were 
ranked in mental age in two tests, since in this case his rank 
would be medium or low in both cases. The general rule, 
then, is that we should always correlate a relative score with 
a relative score, or an absolute score with an absolute score, 
but never a relative with an absolute score. 

The next tables show the correlation between intelligence 
tests and educational tests. Table XVII shows the correla- 
tion between the Otis score and the score in the Gray Oral 
Reading Test. It appears both from the inspection of the 
table and from the correlation coefficient that there is no 
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Taste XVII. CorrELATION TABLE SHOWING THE RELATION 
BETWEEN Otis ScoRES AND GRAY SCORES 


Gray ScALE 


62-64|65-67 |}68—70 74-76 


f= — 0 t05 


correlation between these two measures. The pupil’s ability 
in oral reading does not seem to be determined by his bright- 
ness or his intelligence. This lack of correlation is, of course, 
to be interpreted in the light of the fact that we have a 
relatively homogeneous group. If the group contained 
pupils of very low intelligence, we should undoubtedly find 
their reading attainment to be also comparatively low. 

It is rather more surprising to find evidence in Table 
XVIII that there is little or no correlation between the in- 
telligence test score and the score in the Burgess Silent 
Reading Test. We ordinarily expect a positive correlation 
between a measure of intelligence and a measure of silent 
reading. A moment’s reflection, however, reminds us that 
the Burgess test is a rather easy one, and that it possibly 
does not discriminate ‘satisfactorily between the silent read- 
ing abilities of pupils in the fifth and sixth grade. The 
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TasBie XVIII. Corre ation TABLE SHOWING THE RELATION 
BETWEEN Oris ScoRES AND Burauss ScorEs 


BurceEss ScALE 


rather high correlation between the oral reading score and 
the silent reading score, as shown in Table XIX, suggests 
further that, for these grades, the Burgess tests measure the 
more mechanical aspects of reading rather than the ability 
to get thought from the printed page. 

Since there is practically no correlation between intelli- 
gence tests and these particular reading tests, we cannot use 
such a table as No. XVIII to analyze and interpret the rela- 
tive scores of individual pupils. The degree of correlation 
does not give us ground to expect that a high intelligence 
score will be accompanied by a high achievement in the sub- 
ject. We are not justified in expecting that a particular 
pupil, because he makes a high score in the intelligence test, 
should make a high score in the subject-matter test. We can- 
not use the intelligence test in such a case to segregate pupils 
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TaBLe XIX. CorrELATION TABLE SHOWING THE CORRELATION 
BETWEEN GRAY SCORES AND BURGESS SCORES 


Gray SCALE 
BuRGESS 
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according to their capacity, nor can we regard a high in- 
telligence and a low achievement score as evidence of a lack 
of application. It is only when there is a fairly high correla- 
tion in general between the intelligence test and the subject- 
matter test that we can make such an administrative use of 
the scores. 

In those cases in which there is in general a rather high 
correlation between the scores on two tests, it is appropriate 
to examine those cases which exhibit very wide discrepancy. 
In the case of the correlation between the Gray scores and 
the Burgess scores, for example, we find three children who 
make high scores on the Burgess test and low scores on the 
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Gray test. These children evidently have some specialized 
difficulty in oral reading. The detailed discussion of such a 
problem as this does not belong in a book on mental tests. 
The case is an illustration, however, of the sort of examina- 
tion which may be made of the relation between an intelli- 
gence test and an educational test. 

Our final table, XX, gives the correlation. between the 
Otis scores and the Arithmetic scores. The correlation 
between these scores is low, but perhaps high enough to 
warrant the study of cases showing a very wide discrepancy. 
The pupil who makes a score of 70 on the Arithmetic test 
and only 35 on the Otis test appears either to have special- 
ized ability in arithmetic, to be very industrious, or to be 
incorrectly rated on the Otis test. The examination of other 
evidence concerning the child’s ability would probably 
indicate which of these suppositions is the correct one. 

The aim of the foregoing discussion has been to indicate 
briefly some of the chief ways in which the scores of mental 


TABLE XX. CorrELATION TABLE SHOWING THE RELATION 
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tests may be tabulated and used to promote the efficiency of 
teaching and the handling of pupils. The reader will have 
been impressed with the fact that the results of tests should 
be used cautiously, and that a hasty application of them 
should be avoided. Teachers should be on the lookout for 
discrepancies, and should attempt to follow them up in 
order to arrive at their correct interpretation. Through the 
repeated use of tests and the analysis of their results, the 
teacher and the principal or the supervisor should gradually 
gain a notion of the general capacities, the special capacities, 
and the weaknesses of individual pupils. The use of the 
knowledge which is thus gained will be discussed in detail in 
the chapter on the application of tests. 

The account in this chapter has been designed to serve the 
teacher or principal, rather than the research officer or the 
superintendent. The illustrations have purposely been 
- chosen from the narrower use of tests with small groups and 
the use which relates to the practical handling of the in- 
dividual pupil, rather than to the larger application to ad- 
ministration or research. The teacher cannot well use the 
refined and elaborate methods which are appropriate for 
such wider use, and the research student, or the administra- 
tor of research departments, does not need the rather ele- 
mentary treatment which is here given. For these reasons 
the present chapter has been directed particularly to the needs 
of the teacher rather than of the more highly trained expert. 
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CHAPTER XIII 


THE BEARING OF MENTAL TESTS UPON 
MENTAL GROWTH 


Tue prevailing opinions concerning the nature of intellectual 
growth have been derived from various sources, some of 
them scientific, and some unscientific. Prior to the advent 
of experimental psychology, opinions on that subject were 
based upon observation. With the introduction of the child- 
study movement, information became somewhat more ob- 
jective and systematic. The chief reliance, however, was 
upon the questionnaire method, and this falls far short of 
the necessary reliability. The introduction and wide-scale 
application of mental tests has provided a wealth of facts: 
which vastly exceed all the evidence which had been 
gathered before them. Up to the last three or four years, 
very little use had been made of this mass of facts. We shall 
attempt in this chapter a brief résumé of the problems, and 
of the conclusions which the facts seem at present to warrant 
us in drawing. vine 


1. The problems concerning mental growth 


The first problem concerns the rate of mental growth at 
various ages. The rate of growth is expressed in the mental 
growth curve. We are concerned, first, with the general 
form of this curve. Does it have the same steepness through- 
out its length, or does it rise more rapidly at one time than 
at another? If it is not uniform in steepness, is it steeper 
in the earlier years or in the later years? After the general 
form has been determined, the question remains. whether 
there are minor fluctuations in rate of growth at particular 
periods in the child’s life. Is there, for example, a marked 
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acceleration at the period of adolescence or at some other 
pericd? 

The second problem concerns the relation between the 
curves of growth of different individuals. Are these curves 
of growth in general parallel, or do they cross each other? 
To put it in another way, does the child keep his relative 
position throughout his period of mental growth, or does 
his rank shift up or down in comparison with others? 
Another aspect of this problem is this: Do the growth curves 
of different children culminate at different times? Spe- 
cifically, does the gifted child reach the culmination of his 
mental growth earlier or later than the average child or the 
dull child? Various possibilities in the relationship of the 
level of capacity to the age of culmination exist. Which of 
these possibilities is the actuality? 

A third problem is somewhat related to the second. It 
concerns the amount of variability in the capacities of the 
groups of children at different ages. Is the range or variabil- 
ity of scores made by children at the upper ages greater than, 
equal to, or less than the range or the variability of the scores 
of younger children? The problem may perhaps seem more 
significant if we think of it in relation to the rapidity of 
growth of children at different levels. If the variability 
increases in the later ages, the children below the median, 
or the dull children, must develop more slowly than the 
median child. In the same way the gifted child must grow 
in intellect more rapidly than the median child. The lines 
of growth, then, must diverge. If, on the other hand, the 
variability remains the same, the lines of growth of superior, 
average, and inferior children will be parallel. The scores 
on mental tests should throw some light on the answer to 
this problem. 

A fourth problem is this. How long do children continue 
to increase in mental stature? Do they stop at sixteen 
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years, as is implied in the Stanford Revision, or do they 
reach adult stature at thirteen and one half, as is believed 
by many interpreters of the army tests, or do they continue 
beyond both of these ages? The answer to this problem, as 
to the others, is by no means simple or easy to obtain, as is 
indicated by the divergence of views among competent 
psychologists. The results of our mental tests, however, 
bring us much nearer to the solution than we were before. 

We may perhaps consider a further problem to be the 
practical application of the facts to the technique of mental 
testing, on the one hand, and to problems of school admin- 
istration and of teaching, on the other hand. Some of these 
applications have already been referred to in previous 
chapters, and others will be referred to in the following 


pages. 


2. The form of the mental-growth curve 


The form of the mental-growth curve has long been in- 
ferred from certain early studies of the simpler mental pro- 
cesses. Typical of these are Smedley’s study of the growth 
in memory span, and Gilbert’s study of the growth in a 
variety of mental capacities. We may use Smedley’s curve 
and two of the curves from Gilbert, by way of illustration. 
Fig. 16! indicates that the memory span for digits presented 
orally reaches its culmination at about thirteen years of age. 
The memory for visually presented digits advances slightly 
beyond this age, but at a much lower rate than below it. 
The curve of development in weight discrimination, as 
found by Gilbert ? and shown in Fig. 17, ceases to rise at the 

1, W. Smedley. Report of the Department of Child Study and Pedagogic 
Investigation, p. 50. Chicago Public Schools, 1902. 

2 The growth curves attributed to Gilbert in Figs. 17 and 18 are taken 
from J. Allen Gilbert, “‘Researches on the Mental and Physical Develop- 


ment of School Children”’; in Studies from the Yale Psychological Laboratory, 
vol. II, pp. 40-100. 1894. 
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same age. The curve for the rate of tapping rises markedly 
after the year thirteen. This curve is shown in Fig. 18. 
There is a break from year eleven to year thirteen, but the 
rate of advancement in the adolescent period is about as 
rapid as that in the period of childhood. 

The differences between the growth curves of these differ- 
ent abilities indicate that there is not a uniform law which 
applies to abilities of all sorts. Further evidence upon the 
lack of uniformity appears in the curves of finding and 
crossing letters in sequence reported by Bickersteth! and 
shown in Fig. 19. Here we see that the ability to perform 


1M. E. Bickersteth. ‘The Application of Mental Tests to Children of 
Various Ages”; in British Journal of Psychology, vol. 9, pp. 23-73. 1917. 
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the same sort of acts, but with materials which give it differ- 
ent degrees of complexity, gives different curves. B and C 
are strikingly different in their form from A. Crossing 
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numbers and crossing letters and numbers together gives 
the same curve, although one advances at a lower level than 
the other. Crossing letters, on the other hand, gives a curve 
which reaches its culmination at twelve and one half years. 
It could hardly be maintained that the ability required to 
find and cross letters is radically different from the ability of 
finding and crossing numbers. The form of the growth 
curve and the age at which the ability culminates, then, must 
be determined by something other than, or in addition to, 
the nature of the operation which is performed. We may 
inquire further what this condition, or set of conditions, is 
by the examination of additional curves. 

Pintner and Paterson have given a number of age-growth 
curves in their scale of performance tests which are useful 
for this inquiry. We may contrast two which show widely 
different form and character. The first is shown in Fig. 20,! 
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1 Figs. 20 and 21 are taken from R. Pintner and Donald G. Paterson, A 
Scale of Performance Tests. D. Appleton & Co., New York, 1917. 
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and represents the age progress in a simple construction 
puzzle. We see that the ability of the median child in this 
test nearly reaches its culmination point at year six. The 
dotted lines represent the twenty-five percentile and sey- 
enty-five percentile scores. The curve for the Picture Com- 
pletion Puzzle, shown in Fig. 21, is of a widely contrasted 
type. It rises more gradually, in the first place, and, in the 
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second place, it reaches its culmination much later, namely, 
at fifteen years. There is a break in the curve earlier than 
this, but it comes at year nine, which is different from any of 
the ages at which the previous curves have their culmination. 

Our illustrations have thus far been drawn from the scores 
in single tests, that is, tests of one mental process or a 
relatively narrow group of mental processes. The curve 
for a somewhat more complex set of abilities is represented 
in Fig. 22,' which shows the progress in the Trabue Sentence 

1 Figs. 22 and 23 are taken from M. R. Trabue, Completion Test Language 


Scales, pp. 8, 10 and 37. New York: Columbia University Contributions to 
Education, no. 77. 1916. 
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Completion Test. The middle curve represents the me- 
dian, and the other two curves the twenty-five and seventy- 
five percentiles. This test gives a curve which shows no 
marked break at any level. The steepness of the curve 
varies, to be sure, but it advances continuously from the 
level of the third grade to that of college graduates. 
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The difference between this curve and the others seems to 
suggest that we are dealing with radically different types of 
mental process, and that the various processes follow differ- 
ent courses of mental development. ‘This interpretation, 
however, is hardly supported when we trace separately the 
growth curve of individual parts of Trabue’s language com- 
pletion scale. The growth curves of typical individual sen- 
tences of this scale are shown in Fig. 23. It will be seen that 
these curves differ in their form, in their steepness, and in 
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the ages at which they advance or reach their culmination. 
Since the sentences all belong to the same kind of test, these 
differences must be due to the varying difficulty of the sen- 
tences among themselves. This gives us the clew to at least 
one of the causes of the differences between the various age- 
growth curves. 
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Before inquiring what the other causes of variation may 
be, let us examine some of the growth curves in the intelli- 
gence scales. We may take the scores upon four group tests 
as typical of those of point scales in general. The scores of 
these tests are based upon very large numbers of cases, and 
may therefore be regarded as reliable so far as the number 
of cases is concerned. Taking them in the order of the 
difficulty of the test, we have the curves for the Pressey 
Cross-Out test,! National Intelligence Test,2 The Pressey 

1 The Cross-Out Scale norms are taken from a mimeographed sheet 
distributed by the authors. They are based on 5504 cases. 


The National Intelligence Test norms are taken from Supplement 3 to 
the Manual of Directions, 1924. They are based on 37,069 cases. 
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Group Point Scale ! and The Otis Group Intelligence Scale,? 
shown in Figs. 24 to 27. For the Pressey Cross-Out Scale 
and the Otis Group Intelligence Scale, we. have the ten per- 
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centile, twenty-five percentile, seventy-five percentile, and 
ninety percentile curves, as well as the median curves. For 
the National Intelligence Test and the Pressey Group Point 


1 The Pressey Group Point Scale norms are taken from “‘A Group Point 
Scale for Measuring General Intelligence, with First Results from 1100 
School Children”; in Journal of Applied Psychology, vol. 2, pp. 250-69. 
1918. 

2 The Otis Group Intelligence Scale norms are taken from the Manual of 
Directions, 1921 Revision, p. 62. They are based on 25,226 cases. 
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Scaie of Intelligence, we have the twenty-five percentile and 
the seventy-five percentiles in addition to the median. 
These additional curves will be useful in discussing variabil- 
-jty and the comparison of growth curves of individuals at 
different levels. For our present purpose, however, we are 
concerned chiefly with the median curves. 
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It will be remembered that the fundamental question 
which we are here considering is the form of the mental- 
growth curve. The form may first be considered up to the 
point where the break in the curve occurs. This break 
occurs at different points in different curves, but up to this 
point the typical form which more nearly fits all of the cases 
is a straight line. The National Intelligence Test, which is 
based upon 37,000 cases, gives a very close approach to a 
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straight line. The Otis Group Intelligence Test, based upon 
25,000 cases, is a close approach to a straight line between 
ages ten and eighteen. : 
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Teagarden! reports the results of the Army Alpha and 
the Pressey Senior Classification Test, given to 408 in- 
dividuals in the institution maintained by the Loyal Order 
of Moose at Mooseheart, Illinois. The ages ranged from 


1 Florence M. Teagarden. A Study of the Upper Limits of the Develop- 
ment of Intelligence. New York: Teachers College, 1924. 
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twelve and one half to twenty years. While the author 
makes the statement that the progress curves are negatively 
accelerated, an analysis of them shows that this is not an 
entirely adequate generalization. (See pp. 65, 66.) Both 
the Alpha scores and the Pressey scores advance at a uni- 
form rate to age seventeen and then fall off. 
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Fie. 27. Aare Curves or 25,226 CuitpREN IN THE OtIs 
Group INTELLIGENCE SCALE, ADVANCED EXAMINATION, 
SHowrne tHE Mepian (M.) AND THE 10, 25, 75, AND 
90 PERCENTILES 


(1.Q. of 25 percentile at 12 years is 10/12, or 83.5, and at 16 years it is 
14/16, or 87.5. In the Stanford Revision it is 92.) 


A very significant fact is that at ages at which one test 
shows a slower rate of growth than usual, another test will 
show a uniform rate. For example, the Otis Group Intelli- 
gence Scale shows an unusually slow rate of growth from 


MENTAL TESTS AND MENTAL GROWTH 341 


eight and one half to nine and one half years. The Pressey 
Primer Scale and the National Intelligence Test, on the other 
hand, show an average rate of growth at this age. At the 
other end of the scale the National Intelligence Test shows 
a marked decrease in rate of growth from fourteen to fifteen, 
whereas the Otis Intelligence Scale shows no decrease what- 
ever at these ages. It is obvious that the rate of growth ex- 
hibited by the increase in scores of a test depends not merely 
upon the actual development of the children, but also upon 
the character of the test itself. The aspect of the test which 
is probably the important one in determining the increase 
in the scores up to the age of maturity is its difficulty. 

Another factor which may also affect the form of the 
growth curve is the selection of cases. If only a certain 
number of grades are tested, the younger children in these 
grades will be the brighter ones of their ages. The older 
children, on the other hand, will be the duller ones. This 
factor of selection may be responsible for some of the dis- 
crepancies which have been pointed out. 

The divergences from the rule that the development in 
the abilities measured by group point scales is more nearly 
represented by a straight line than by any other form of 
curve are to be explained, apparently, by causes largely 
extraneous to the course of mental development itself. 
Where there are marked divergences from a straight line in a 
particular curve, they are contradicted by another curve; 
and the divergence may be explained either by the character 
of the test, that is, by its lack of adaptation to the group 
which is being measured, or by the selection of cases. 

How, then, may we sum up the facts which are represented 
in these diverse curves? It appears that there are a number 
of factors which may influence the growth curve. In the 
first place, the curve may be determined by the nature of 
the ability which is tested and by the rate of growth in this 
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ability. This seems to be the determining factor in some of 
the tests of sénsory capacity. It hardly seems that the 
growth curve in these cases is determined by the nature of 
the test. We may conclude, then, that there are certain 
narrow and specialized capacities which have their own rate 
or period of development, and that these culminate at some- 
what different stages. During the period at which these 
abilities are developing, however, they appear to develop at 
a fairly constant rate. 

In the second place, the form of the curve and the age of 
its culmination depends upon the difficulty or range of diffi- 
culty of the test. This was brought out clearly in the con- 
trast between the curve of the different sentences of the 
Trabue Completion Test. Inferences have frequently been 
drawn concerning the way in which a mental ability develops 
when the form of the curve was due simply to the fact that 
the test was a very easy or a very difficult one for certain of 
the age groups which were being tested. This frequently 
applies in the explanation of a very rapid advance in the 
early years, or a slow advance in the later years. 

Finally, there is apparently a marked difference between 
a score in the composite of a series of tests and the score of 
the simpler components, or specialized abilities. This can 
perhaps be reduced finally to the differences between the 
difficulty of various tests. It is quite evident that the 
median scores in succeeding ages advance more evenly in a 
composite test of wide range of difficulty than in a simple 
test. Such a composite test gives opportunity for those at 
the lower end of the scale of ability to make a score, and at 
the same time for those at the upper end to show their 
superiority. For this reason, the scores in group tests which 
contain large numbers of elements of widely varying diffi- 
culty give a more regular advance than the scores in single 
tests covering a narrower range. The facts appear to in- 
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dicate, then, that while the ability to perform particular 
narrow tasks may culminate at various periods in one’s life, 
ability in general advances at a fairly steady pace. The 
advance appears to be not so much from one ability to an- 
other as from the ability to perform a task at one level to the 
ability to perform a task at another level, all levels requiring 
the same general kind of ability. This statement may not 
apply to the more highly specialized sensory or motor 
capacities, but it does appear to apply to those which we 
ordinarily denominate as the higher intellectual processes. 
After we have established the facts concerning the nature 
of the curve of mental growth, we have still left the question 
of the interpretation of this fact. To what is the rate of 
mental growth due? Is it the product of the maturing of the 
nervous system and therefore inherent in the child’s organ- 
ism; is it the product of his education; or is it the combina- 
tion of these two factors? We shall find it necessary to deal 
with this question in attempting to determine the age at 
which mental growth culminates. So far as the general 
question is concerned, we can be pretty sure that the mental 
development exhibited by the typical child is influenced 
both by the maturing of his nervous system and by the 
training he has received. Some psychologists would prob- 
ably ascribe growth chiefly to inner maturing. ‘The reason 
for assigning part of the development to training will be 
discussed under the topic on limits of mental growth. 


3. The relation of growth curves of different individuals to one 
another 


The first question with which we are concerned in con- 
nection with the relationship of growth curves of different 
individuals is this: Does the growth curve of an individual 
in general hold the same relative position at successive ages, 
or does it cross the curves above or below it? This is an- 
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other way of putting the question: Is the I.Q. constant, or 
variable? Since it is probable that the I.Q. is not perfectly 
constant the question becomes: How nearly constant is the 
1.Q.? Is its constancy such that we may predict with any 
reasonable certainty from the individual’s I.Q. at any one 
age what it will be at another age, or does it fluctuate so 
widely that no such prediction can be made? 

Typical facts concerning the constancy of the I.Q. are 
represented in Table XXI. This table shows the amount 
of variation which occurs upon retesting by means of the 
Binet test. It will be seen from the table that, in from 
eighty-five to ninety per cent of the cases, the second test 
gives an I.Q. within ten points of the first one. In the mid- 
dle half of the cases the variation is about three points 
downward, or four or five points upward. The average 
change from one test to the other is about five points. 
Finally, the correlation coefficient between two successive 
tests is usually between .85 and .90. ‘These facts indicate a 
degree of constancy which warrants us in saying that the 
1.Q. is a rather stable quantity, that the lines of mental 
growth do not show much crossing, except in the case of 
pupils of nearly the same mental capacity, and that the LQ. 
is a reasonably safe basis for prediction when we assume a 
probable error of four or five points. 

The actual uniformity of a child’s intellectual capacity is 
probably greater than is indicated by these figures which 
give the constancy of the I.Q. The tests upon which the 
constancy of the I.Q. is determined were made at intervals 
of a year or more apart. The changes in the standing of the 
pupil between the succeeding tests are obviously due both to 
the changes which may have taken place in his intellectual 
capacity and to errors in the test. If the retest is made 
immediately there will still be considerable variation in the 
standing of an individual pupil. In order to determine 
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Taste XXI. Muasures oF THE VARIATIONS IN THE I.Q. on 
Rerestine As Founp in SeverAu Typican STupigs 
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UL. M. Terman, The Intelligence of School Children, chap. rx. 1919 
2H. Rugg and C, Colloton. “Constancy of the ‘Stanford-Binet I. Q. as Shown by Re- 
tests »: in Journal of Educational Psychology, 12, pp. 315-22. 1921. 
35.C. Garrison. ‘Additional Retests by Means of the Stanford Revision of the Binet- 
Simon Tests”: in Journal of Educational Psychology, vol. 13, pp. 307-12. 1922. 
4L.S. Rugg, “‘Retests and the Constancy of the 1.Q.”; in Journal of Educational Psychol- 
ogy, vol. 16, pp. 341-43. 1925. 


how much of the variation which occurs in retests made after 
a longer period of time is due to actual variation of the pupil’s 
capacity, it is necessary to make allowance for this error. 
We may get some notion of amount of allowance which 
needs to be made by comparing the correlation between the 
standing of pupils in successive tests made immediately 
and in tests which have a long interval between them. The 
comparison yields the surprising fact that the correlation 
after an interval of time is not much less than when the retest 
is given immediately. ‘This indicates that a large amount 
of the fluctuation which we have been noting is due to 
variability in the performance of the pupil, rather than to an 
actual change in the pupil’s capacity. A remote prediction 
on the basis of the I.Q. is not much more hazardous than is 
immediate prediction, since a large amount of the variation 
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which we find to occur upon retests will occur when the 
retest is immediate. 

We have seen that the first typical fact about the rela- 
tionship between the various individual growth curves is 
that there is not much crossing and recrossing between 
them, except among those which are adjacent to each other. 
Children retain much the same relative position at various 
ages. A second question which may be asked is this: Do the 
curves get farther apart as children grow older, do they get 
nearer together, or do they remain the same distance apart? 
Are older children more alike, less alike, or equally alike in 
the degree of their intellectual capacity as compared with 
younger children? This question may most readily be 
discussed in connection with the facts concerning the varia- 
bility of scores of successive ages. We will therefore defer 

the consideration of it until a later section on variability. 


4. Relation between level of intelligence and age of maturity 

The third question relates to the age at which pupils of 
different levels of intelligence reach their maturity. The 
popular view is that children who are bright in early child- 
hood mature early, and therefore lose their advantage. 
Pupils who mature slowly, on the other hand, are thought to 
continue longer and to catch up, at least in a measure, with 
their more precocious companions. ‘This view is perhaps 
the outgrowth of the theory of the meaning of infancy in 
evolution, which was expounded by John Fiske. This 
theory is that the higher animals possess a longer period of 
infancy and that this gives opportunity for a higher men- 
tal development as well as for the development of social 
organization. This theory has been carried over to the 
interpretation of the mental development of individual 
children. 

The prevailing scientific theory is quite the opposite of 
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the popular one. It is common among psychologists to 
believe that the feeble-minded child reaches a stage of mental 
arrest comparatively early. The normal child is thought to 
continue his development longer, and the gifted child to 
continue to grow intellectually to a still later age. 

A third possibility is that children at different levels of 
intelligence reach the culmination of their mental growth at 
about the same age. These three conceptions are repre- 
sented diagrammatically in Fig. 28. Part A represents the 
popular view, Part B, the prevailing scientific view, and 
Part C the third possibility. The question now is: Which 
of these hypotheses is in closest agreement with the facts as 
they are revealed by our mental tests? 

The bearing of the evidence of mental tests upon our 
question seems to be clear. If we examine the four figures 
(24 to 27) which summarize the age scores of the four tests 
which we have presented, we will see that in all cases the © 
break in the scores of children at different levels comes at 
the same age. In the National Intelligence Test (25), for 
example, the break comes between age fourteen and age 
fifteen. In the Otis Advanced Examination (27) the break 
comes between age eighteen and age nineteen. The break 
comes at the same point for the lower percentiles as it does 
for the higher percentiles. A particular child reaches his 
level of intellectual capacity, whatever that level may be, 
at approximately the same age as other children. This does 
not mean that there is not individual variation in the age of 
maturing. There probably is individual variation, and it 
may be large in amount. It does mean, however, that this 
variation is not correlated with intelligence — that the bright 
child is as likely to be early or late in maturing as is the 
average child or the stupid child. 

What is the practical bearing of this fact of similarity in 
time of maturity of children of different ages? It might 
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seem at first sight to have a bearing upon the type of treat- 
ment which should be given to backward or to superior 


Fic. 28. Tauren Possrste RELATIONSHIPS BETWEEN 
LeveL, or Inrenuicence anp Acre or Mentan 
Maruriry 


children. The question, however, is probably too compli- 
cated to be dealt with in an offhand way, and we need more 
investigation than has been given the question at the 
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present time in order to arrive at an answer. There is an- 
other important practical application, however, which 
appears on the face of the facts. This relates to prediction. 
If the facts are as has been stated, we should not expect, on 
the one hand, that the backward child should ultimately 
fall far behind the position which he occupies as a child. 
Nor should we expect him to gain materially in later youth 
over his childhood position. The same may be said of the 
superior child. In other words, prediction is somewhat 
more certain under the facts as they appear to exist than it 
would be under either the popular view or the prevailing 
scientific view. 


5. The variability in intelligence in succeeding ages 


The problem of this section may be stated in terms of the 
relationship between the growth curve of various indi- 
viduals. Three possibilities exist. In one case the growth 
curves might diverge; in another case they might run par- 
allel; and in a third possible case they might converge at the 
later ages. We may put this problem in other terms by ex- 
pressing it in terms of the variability in the intelligence of 
children at various ages. It is obvious that if the growth 
curves diverge the variability is on the increase. If they run 
parallel the variability remains the same, and if they con- 
verge the variability decreases. 

The customary scientific view is that variability in 
capacities of different iridividuals grows greater at the time 
of adolescence, and that children in the early stages of ad- 
olescence differ from one another much more widely than do 
younger children. We have already seen, in discussing the 
intelligence quotient, that one of the conditions which may 
make the intelligence quotient valid is a regular increase in 
variability with successive ages. The question is of some 
theoretical as well as of some practical importance. 
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The customary view has been challenged by Henmon in a 
study reported in 1922.1. Henmon bases his view upon a 
summary of the data which had been reported from a large 
number of mental tests. These tests were given by Gilbert, 
Pyle, and Bickersteth. The data are available from ages 
seven to eighteen. The summary is presented in Table 
XXII, which is copied from Henmon’s article (page 24). 


TABLE XXII. Summary or VARIABILITY IN MENTAL TRAITS 


NUMBER OF COEFFICIENT OF 
CASES VARIABILITY 


The conclusions which one draws from such data will depend 
in part upon what is taken as a measure of variability. The 
coefficient of variability which Henmon uses is a ratio. It 
expresses the amount of variation in proportion to the size 
of the measure, that is, to the average score for the age in 
question. Thus, if the average is 50 and the standard de- 
viation or the mean deviation is 10, the coefficient of varia- 


1V. A. C. Henmon and'W. F. Livingstone. “Comparative Variability 
at Different Ages”’; in Journal of Educational Psychology, vol. 13, pp. 17-29. 
1922, 
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bility will be .20. If, however, the average is 100 and the 
mean deviation or standard deviation is 10, the coefficient 
of variability will be .10. In consequence, if the score 
increases with age and the standard deviation remains the 
same, the coefficient of variability will decrease. This must 
be kept in mind in comparing Henmon’s data with that from 
other studies. 3 

It is apparent from Henmon’s table that from ages seven 
to ten there is a marked decrease in the coefficient of varia- 
bility. From ages ten to seventeen there is very little 
change. The coefficient of variability remains practically 
constant. This means that from ages ten to seventeen the 
variation expressed in absolute units increases about pro- 
portionately to the increase in the score itself. If we made 
a chart showing the age curves of the median child, and of 
those above and below the median, there would be a diver- 
gence in the lines of progress. 

The data we have been reviewing are derived from single 
tests, and for the most part from tests of rather simple 
mental processes. Let us turn to the results of group intelli- 
gence tests. We may refer again to the charts which give 
the results of the four point scales, and which are shown in 
Figs. 24 to 27. It is quite clear that in all four of these cases 
the growth curve of children above and below the median 
run remarkably parallel to each other and to the median 
line. In the Pressey Cross-Out Test the range of the middle 
fifty per cent of the scores in successive ages is as follows: 


Interquartile range 


Coeff. of variability |.39 | .32 | .25 
tee 
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It is evident that there is no increase in the range of scores 
in the higher ages. The coefficient of variability diminishes 
sharply. In the case of the National Intelligence Test, the 
interquartile range, which includes the middle fifty per 
cent of the cases, is as follows: 


AGE 


s]| of | ul w|i 


Interquartile range 


Coeff. of variability 


At the ages fourteen and fifteen, the range is somewhat less 
than it is at the intermediate ages. During the ages from 
nine to thirteen, however, the range is remarkably constant. 
The coefficient of variability decreases rapidly as before. 
The variability in the Pressey Group Scale of Intelligence 
is expressed numerically in the following table: 


13 | 14 15 | 16 


Bo a { Berane range ..|33.7 37, 5 |44.8 150.8 [88.0 135. . 87.0 |29.0 |23.8 
YS \ Coeff. of variability..| .51] : -28] 28] .16 


Girls aoe eine range .. 


babe 25.6 |23.8 
Coeff. of variability . . .20 


= -16 


We have here again the same picture as in the two other 
cases. ‘There is a greater variability in the middle than at 
the extremes. Except in the case of the ten- and eleven-year- 
old boys, however, the difference is not great. The increase 
which does occur is earlier than the adolescent, period and 
can therefore not be accounted for by the adolescent devel- 
opment. The coefficient of variability shows the same 
_ decrease as in the other cases. 
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The recent investigation by Teagarden ! seems to contra- 
dict the findings from the other scales, and the author con- 
cludes “. . . variability is found to increase in all the group 
test measures.”’ The data shown in Table XXIII however 
indicate that the variability of the Binet mental ages does 
not increase; that the variability of the Pressey scores in- 
creases somewhat, but less slowly than the ayerage scores, 
so that the coefficient of variability decreases; and that 
only the Alpha scores show an increase in variability nearly 
equal to the increase in the average. The coefficient of 
variability, even here, remains constant. 


Taste XXII. Tue Stanparp Deviation or ScorEs In Suc- 
CESSIVE AGES IN TEAGARDEN’s STUDY 


(The scores which Teagarden gives by half years are combined by the 
method used in smoothing curves.) 


17 


17% 


Binet S.D. | 23.3/21.2 |21.8/20.9/21.9)/21.4/21.3/21.1/20.8 |21.1 
M.A. 


Alpha §.D. | 25.5/24.2 |24.2/26.5/27.3/28.0)28.6/29.9/31.4 |31.0 


Coeff. 

of var. 84) H .30 
Pressey} S.D. |13.6/13.3 |13.8/14.8/16.0/15.8/17.0|17.7/19.1 

Coeff. 

of var. 45 


In summary, the facts seem to show that, in most cases, 
the scores of single tests have a somewhat inereased varia- 
bility with age, but that the coefficient of variability remains 
constant. In the case of scores of group tests, however, the 
variability does not increase but remains constant. If the 
coefficient of variability were calculated, it would decrease. 
Our conclusion upon the evidence thus far presented, then, 
is that the variability in the scores of children from age to 
age remains relatively constant. 

1 Op. cit. 
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6. The evidence from I.Q.s and mental ages 


Conclusions have sometimes been drawn concerning the 
variability of intelligence from age to age, as well as con- 
cerning the form of the mental growth curve, from the dis- 
tribution of I.Q.s from age to age, or from the succession 
of I.Q.s of the same individuals which are derived from re- 
peated measurement. Since the LQ. is a ratio, and, as we 
have already seen, the I.Q. depends upon the form of the 
curve of mental development and also upon the distribution 
of capacities in succeeding ages, it is not possible to infer 
what the nature of either the one or the other of these two 
factors is from the I.Q. itself, or from the distribution of 
I.Q.s. The lines of mental growth of children of various 
degrees of intelligence, expressed in the form of I.Q., there- 
fore, does not help us in determining what the variability of 
capacities of the succeeding ages is. 

The same must be said of mental ages. Kuhlmann, for 
example, has presented data from repeated measurements 
to show that the mental ages of feeble-minded children 
advance less rapidly than the mental ages of normal children. 
The lower the grade of feeble-mindedness, furthermore, the 
slower is the advancement in mental age.'! The distribu- 
tion of the amounts which were gained per year in six hun- 
dred cases are given as follows: A gain of more than twelve 
months in mental age per year, 4.8 per cent; gains of 0-11 
months per year, 68 per cent; gains of 0, 11 per cent; gains of 
less than 0, 16 per cent. 

In any scale for which the I.Q. is an appropriate measure, 
of course, the mental ages of inferior individuals must ad- 
vance more slowly than do the mental ages of normal or 
superior individuals. Otherwise, the ratio of mental age to 

1F,. Kuhlmann. ‘The Results of Repeated Mental Examinations of Six 


Hundred and Thirty-Nine Feeble-Minded over a Period of Ten Years”; in 
Journal of Applied Psychology, vol. 5, pp. 195-224. 1921. 
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chronological age would not remain constant. For example, 


assume two individuals as follows: 


CasE 1 CasE 2 
Age M.A. EQ: M.A. T.Q. 
6 4 66 6 100 
12 8 66 12-100 
Gain 4 years 6 years 


Case 1, with an I.Q. of 66, has gained four years in mental 
age between ages six and twelve, while Case 2, with an I.Q. 
of 100, has gained six years in mental age. The same fact 
is shown graphically in Fig. 29. By referring back to the 
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discussion of the I.Q., we see that this difference in the rate 
of advancement of the mental ages of those of different levels 
of intelligence is explicable on the ground, either of the form 
of the growth curve, or of the variability of abilities at 
successive ages, or both. 

The same remark applies to Kuhlmann’s further finding. 
He presents a curve to show that the I1.Q:s of inferior chil- 
dren decline as they grow older. That is, their rate of 
advancement in mental age is not only slower than that of 
normal children, but is slower, even, than the rate required 
to keep the 1.Q. constant. This fact, however, does not 
create any new problem, nor does it necessarily mean, as 
Kuhlmann infers, that there is an increase in variability in 
the later ages. The fact might be due equally well to a 
greater curvature in the general line of mental growth than 
is required for a constant I.Q., as shown in Fig. 30. 

Let it be repeated again that a constant I.Q., or even a 


Score 
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Fig, 30. Intusrration or A Form or Growrs CurvrE wHICH 
Givns A Fatuine 1.Q. 
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declining I.Q., which is found to hold for a particular type of 
scale, cannot be interpreted as an indication of the nature 
of mental development in general, unless other evidence 
agrees with it. The constant or slightly falling I.Q. with 
the Binet type of scale probably indicates a diminishing 
rate of growth for that scale, possibly accompanied by a 
widening variability, but the growth curves of point scales 
and the inconstant I.Q.s which are derived from them, 
indicate that the real mental growth is more nearly uniform 
in rate and in distribution from year to year. 

It is recognized that the large collections of scores from 
intelligence tests on which our conclusions have been mainly 
based may serve to conceal the facts which we might gain 
from a more minute study of individuals. The groups 
which form the basis of the averages for successive ages are 
probably based upon cases which are selected differently 
in the succeeding ages. The conclusions which we draw, 
therefore, may well be somewhat tentative, and subject to 
modification from the repeated measurement of the same 
individuals from year to year. Our tentative conclusion, 
however, is that individuals of different degrees of intelli- 
gence, at least within the great middle range of cases, ad- 
vance at about the same rate. The gain which they make 
from year to year is not greatly different. ‘The consequence 
is that the distribution of scores of succeeding years does not 
vary greatly. We may expect backward children to gain 
about as much from year to year and to continue to gain 
about as long as normal children or bright children. This 
conclusion is quite contrary to the prevailing opinion, but it is 
one which seems to be supported by the best available facts. 


7. The age limit of mental growth 


Perhaps the first clear-cut estimate of the age at which 
intellectual growth ceases was made by Terman, in the 
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standardization of the Stanford Revision of the Binet 
Seale. Terman, it will be remembered, found it necessary 
to assume such a limit in calculating intelligence quotients 
for the upper ages. The intelligence quotient is the ratio 
of the individual’s mental age to his chronological age. 
It is obvious that if the mental age ceases to increase at a 
given life period, and if we continue to find the intelligence 
quotient in the same fashion for individuals who have ad- 
vanced to an age beyond this point, their intelligence quo- 
tient will regularly diminish as the chronological age 
advances. It is therefore necessary to use, in place of the 
chronological age, a fixed number for all those whose age is 
greater than that at which the mental growth ceases. This 
age was taken by Terman to be sixteen years. 

The estimate of sixteen years as the age at which mental 
growth ceases was pretty generally accepted until the ad- 
vent of group mental testing and the interpretation of the 
results of mental testing in the army. When the achieve- 
ment of the adults in the army mental tests was compared 
with the achievement of children on similar tests, it was 
found that the average adult performance was equal only to 
that of children of about thirteen and one half years. This 
has led to the natural conclusion that mental growth ceases 
at about this age. In other words, the army tests are 
thought to indicate that the general capacity to learn does 
not increase beyond thirteen or fourteen years of age. This 
conclusion has aroused intense and widespread controversy 
over the question of the mental age of adults and of the age 
at which mental growth ceases. Before examining further 
the results from the army tests, let us review the facts as 
they are presented in the reports of the application of group 
tests to children. 

We have already seen that the form of the mental growth 
curve, as derived from different tests, varies. The same 
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thing is true of the limit of mental growth. In the case of 
the four tests which are presented at the beginning of the 
chapter, we have four: different ages at which the rate of 
mental growth is retarded. In the Pressey Cross-Out Scale, 
the advancement in the scores begins to decrease at age 
nine; in the National Intelligence Test at age fourteen; and 
in the Otis Intelligence Scale at age seventeen or eighteen. 
Since the limit of mental growth has often been inferred 
from scores of a single scale, this variation is noteworthy. 
It is clear that, in order to determine the age at which mental 
growth ceases, it is necessary to employ a test which we are 
sure will give opportunity for the most mature individuals 
to demonstrate their superiority to those of less maturity. 
Obviously, our conclusions must be drawn from the results 
of the test which shows the most prolonged advancement. 
In the cases before us, this is the Otis test. This test in- 
dicates that there is mental growth at least until eighteen 
years of age. The earlier arrest in the case of the other tests 
is not an evidence of a cessation of mental growth, but 
rather of a limitation in the difficulty of the test. As far 
as the evidence from the large-scale application of mental 
tests to children of different ages goes, then, we must con- 
clude that mental development continues even past the age 
of sixteen, and at least up to seventeen or eighteen. 

The use of tests of different children at different ages as a 
basis for conclusion regarding a limit for mental growth is 
open to serious criticism. ‘This criticism is based upon the 
fact that the groups at stccessive ages are not equally re- 
presentative. It is a well-known fact that beyond the fifth 
grade there is considerable elimination from the school of 
the children who are below average in intelligence. Those 
that remain in the upper grades of the elementary school 
and in the various grades in the high school are successively 
a more and more selected lot. It hardly seems possible that 
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this selection is great enough to produce an appearance of 
mental growth when it has ceased altogether, but it un- 
doubtedly enhances, or at least it may enhance, the apparent 
rate of growth. 

A more reliable type of evidence is drawn from repeated 
measurements of the same children. If we retest children 
of various ages after the lapse of a year we are enabled to 
make a direct comparison of the amount of gain which is 
made at the different ages. Three studies of this sort will 
be referred to. : 

A report on repeated tests of the same 172 children, ages 
nine to fifteen, has been made by Brooks. The retests were 
made with a large number of tests of diverse kinds. Some 
of these were mental tests, some of them educational tests. 
Some were tests of rather simple mental operations, and some 
of the higher, or more complex, mental operations. The 
result of the retests, showing the gain made by the children 


of various ages from nine to fifteen, are reproduced in 
Table XXIV. 


Taste XXIV. Mean Gatns in Srveter, Memory, Hicuer, 
INFORMATIONAL, AND CoMBINED Funcrions, EXPRESSED AS 
THOUSANDTHS OF THE MEAN StanpDARD DerviaTION or AGES 
ELEVEN, TWELVE, AND THIRTEEN FOR EACH SEX 


INFORMA- 


TIONAL 


SIMPLE Mrmory 


1 Fowler D. Brooks. Changes in Mental Traits with Age Determined by 
Annual Retests. Bureau of Publications, Teachers College, New York. 
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This table indicates that there was substantial gain made 
by the children up to and including the period from fourteen 
to fifteen in Bll four types of tests which were studied. While 
the gain in the last year was slightly less than in the preced- 
ing years, it was considerable, and gave no evidence of im- 
mediate cessation. 

The most extensive study of the gains made in repeated 
measurements has been reported by Thorndike.! The in- 
vestigation includes 8564 cases. A rather elaborate form of 
test, which is similiar to the customary general intelligence 
group tests, was given to high school students, and then 
repeated after a year. Two forms of the test were used, 
part of the pupils taking form A the first time, and part of 
them form B. The difficulty of the two forms was equated. 
The practice effect was calculated by comparing the scores 
of a certain number of pupils who took the test on the second 
occasion, but had not had the first test, with those who had 
had the first test the year previously. The immediate net 
gain for three grades of the high school was given as follows: 
Grade ten, 10.5, grade eleven, 11.7, grade twelve, 11.5. 
This gain is estimated by the author as about equivalent to 
ten months’ mental age. It therefore appears that, at least 
up to the twelfth grade, high school pupils gain practically 
as much as would be gained by younger children. In other 
words, the evidence seems to be that mental growth con- 
tinues substantially undiminished to the end of the high 
school period. This is a highly impressive body of evidence 
on our problem. 

The evidence from Thorndike’s investigations is supported 
by a study by William Johnson. This study is reported in a 
Doctor’s thesis.2 Johnson tested over 500 high school 


1. L. Thorndike. ‘On the Improvement in the Intelligence Scores from 
Fourteen to Eighteen”; in Journal of Educational Psychology, vol. 14, pp. 
513-16. 1923. 

2 William H. Johnson. The Mental Growth Curve of Secondary School 
Students, The University of Chicago Library. 1923. 
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students with the Chicago Intelligence Test, and retested 
them at the end of the year. The students ranged in age 
from twelve to seventeen on the first test. The gain in 
points made during the year is represented in the following 
table: 

Taste XXV 


AGE AT SEcoND TEST 


4 [15 | 16 | 17 18 
s1 | 107 72 | 35 


9.4 


The results of retests seem to indicate that mental growth 
continues even beyond the age of sixteen, and at least up to 
the age of eighteen. While this view seems to be so well 
supported, it is not the prevailing view and we must deal 
with the objections to it. 

Calculations are presented by Dearborn ! to show that the 
I.Q. based upon the mental age of fourteen and one half 
gives a better distribution than when it is based upon age 
sixteen. The test used was his own group intelligence test. 
The inference is that mental growth ceases, or nearly ceases, 
at fourteen and one half. If mental growth ceases at this 
age, of course it should be used as the basis of calculating 
the IL.Q., as has already been said, but evidence has been 
presented to show that mental growth does not cease at 
fourteen and one half. The. conclusion which must be 
drawn, then, is that the Dearborn test is too easy for pupils 
beyond this age. 

A contrary view is expressed also in a quotation from the 
Army Report.2 The Report states (page 785): “It appears 

1W. F. Dearborn. ‘“‘Intelligence Quotients of Adults and Related 


Problems”; in Journal of Educational Research, vol. 6, pp. 807-25. 1922. 
* Psychological Examining in the United States Army, chap. Xt. 
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that the intelligence of the principal sample of the white 
draft when transmuted from the Alpha and Beta Examina- 
tion to determine the mental age is about thirteen years 
(13.08). Here we have a measure of the average intelligence 
of nearly one hundred thousand white recruits.” The 
Report then goes on to comment upon the statement further 
by qualifying it on account of the probable selection of cases 
inthe army. A certain number of the men became officers 
or enlisted as officers before the draft. These were men of 
high intelligence and their elimination would somewhat 
lower the score. This lowering, however, was estimated as 
not being sufficient to greatly affect the average for the draft. 

The Report continues, on page 789, as follows: 

We know now approximately from clinical experience the 
capacity and mental ability of a man of thirteen years’ mental age. 
We have never heretofore supposed that the mental ability of this 
man was the average of the country, or anywhere near it. A 
moron has been defined as any one with a mental age of from seven 
to twelve years. If this definition is interpreted as meaning any 
one with a mental age less than thirteen years, as has recently been 
done, then almost half of the white draft (47.3 per cent) would 
have been morons. Thus it appears that feeble-mindedness as it 
is presently defined is of much greater frequency of occurrence than 
had been originally supposed. 


We may add some calculations on our own account. If the 
true mental age of adults is what seems to be indicated from 
the army test —that is, if the significance of mental age is 
what we supposed it to be—then we have a curious situation. 
From the table on page 790 we find that 30.3 per cent of the 
white draft of the principal sample had a mental age of below 
twelve, that is 11.9 or below. Translated into terms of I.Q. 
this means that 30.3 per cent had an intelligence equivalent 
to an I.Q. of 75 or below. The number of children whose 
I.Q. is 75 or below, and whose ultimate mental age is below 
twelve, is, if we may rely upon Terman’s calculation, 2.3 per 
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cent. Thus 2 per cent of children are morons, and 30 per 
cent of adults are morons, and somewhere between child- 
hood and adulthood 28 per cent have fallen into this estate. 
An argument which involves such a conclusion evidently 
has a fallacy in it somewhere. 

It is necessary to attempt to clear up the paradox. How . 
does it come that the mental age of adults is apparently so 
low? One explanation which has been offered has already 
been referred to. The men of the draft are not wholly repre- 
sentative of the adult male population. It hardly seems pos- 
sible, however, that this can fully account for the enormous 
discrepancy between the standing of children and adults. 
The explanation is probably to be found in the lack of com- 
parability in the scores of the typical poorly educated adult 
and of the child who is in school. The adult whose educa- 
tion ceases at fourteen in all probability deteriorates greatly 
in his ability to do the type of thing required by group tests. 
Furthermore, the conditions under which tests have been 
given have been proved time and again to have a large in- 
fluence upon the reaction of those being tested. Undoubt- 
edly the conditions of the testing in the army were much 
less favorable than in the ordinary school. These are 
probably the major reasons for the discrepancy between the 
standing of the men in the army and children upon the same 
test. The army test results do not, therefore, discredit the 
very cogent evidence that mental growth, so far as we can 
judge from tests, continues well through adolescence. 


CHAPTER XIV 
THE EDUCATIONAL USES OF TESTS 


Tue plan of the chapter is as follows. We shall first review 
briefly certain of the basic facts which underlié the applica- 
cation of tests in education. Some of these facts have been 
brought out fully in previous chapters, and some of them 
have been referred to only incidentally. We shall next 
consider the more general uses of mental tests by the super- 
intendent or the principal. These general uses are con- 
cerned with gathering facts about the school system which 
have a bearing upon administrative problems. The re- 
mainder and the larger part of our discussion will relate to 
the administrative use of tests in dealing with the individual. 
This includes classification, guidance and counseling, and 
other aspects of handling of the individual pupil. 


- I, BASIC FACTS UNDERLYING THE APPLICATION OF TESTS 
TO EDUCATION 


1. Mental growth 

Mental tests have given us much detailed information 
concerning the mental development of children from year to 
year. This information has been reviewed in detail in 
Chapter XIII. We shall simply summarize some of the out- 
standing facts which are important for understanding the 
uses of tests in the school. Mental tests have shown clearly 
that the pupil grows in mental power or in learning capacity 
from the time of entering the kindergarten until later adoles- 
cence. There is not entire agreement concerning the com- 
parative rate of growth during different periods of child- 
hood and youth; but the more recent and reliable evidence 
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seems to show that the rate of growth is more nearly uni- 
form throughout these periods than investigators formerly 
supposed. We can count on increased capacity to do 
school work or on capacity to do work of a higher level or 
more difficult character, certainly up to the age of fourteen 
and fifteen, and very probably up to the age of eighteen or 
twenty. This general fact has been recognized in the or- 
ganization of the curriculum into levels which make higher 
and higher intellectual demands upon the pupils. 

Of more direct bearing upon the problem of classification 
and individual guidance are the facts concerning the relation- 
ship between the growth curves of different individual 
children. The first important fact is that these curves, in 
general, run parallel to one another. The curves of mental 
growth of different children do not cross. Assuming similar- 
ity of treatment, if a child’s mental level is below that of 
another at one age, it will in all probability remain below at 
succeeding ages. Whether the lower-grade child’s ability 
falls farther beyond that of his superior, o: whether the 
difference between abilities remains the same in amount is a 
matter of detail won which there is still difference of opin- 
ion. The general fact, however, is established — that 
children for the most part do not interchange their intellect- 
ual rank, but retain the same relative rank throughout their 
lives. That this is the case is shown by the fact that the 
correlation between the mental test scores of a group of chil- 
dren taken several years apart is practically as high as the 
correlation between the two tests repeated within a short 
interval. A reliable test score of an individual found at one 
age may be taken as a reasonably safe basis of determining 
what his mental capacity will be at some future age, as- 
suming general similarity of training. 

The relation between intelligence or brightness and the 
mental growth curve is a matter upon which our information 
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is still incomplete. We do not know whether the bright 
child matures earlier than the average child, or later, or at 
the same time. So far as we can judge from our present 
evidence, children of different levels of intelligence mature at 
about the same age, and the prediction of intellectual ma- 
turity is not seriously affected by the intelligence level. 
Again, we cannot with any certainty infer what the facts 
with reference to mental growth are from our knowledge of 
physical growth. This is a detailed item of information 
that will add to the refinement of the educational applica- 
tion of tests when we have secured it. It will not, however, 
alter the general nature of their application. 


2. Individual differences 


The fact that extreme individual differences in mental 
capacity exist has of course appeared repeatedly during the 
course of our discussion. ‘They were revealed, for example, 
in the distribution of the I.Q.s, in the chapter on the Stan- 
ford Revision of the Binet Scale, and illustrations were given 
in the chapter on the tabulation of mental test scores. The 
magnitude of individual differences is so much a matter of 
common knowledge that it is hardly necessary to dwell upon 
its existence. We may give but one illustrative statement. 
According to Terman’s calculation of the percentage of 
children of various I.Q.s, we may calculate the number of 
those of twelve years of age whose general intelligence gives 
them a mental age a specified distance above or below 
twelve years. If we tested all twelve-year-old children we 
would find that ten per cent of them had a mental capacity 
equal or inferior to that of the average child of ten years and 
two months. At the other end of the scale the brightest 
ten per cent of children would be found to have mental age 
equal or superior to that of the child of thirteen years and 
eleven months. In other words, twenty per cent of the chil- 
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dren would be either approximately two years inferior or two 
years superior to the average twelve-year-old child. The 
average of the upper tenth and the lower tenth would be 
separated from each other by a space of four years. Ina 
school of one thousand children, two hundred would belong 
to one or the other of these two extreme groups. 

Our knowledge of differences in general intelligence is 
more complete than our knowledge of differences of special 
intellectual capacities or in the non-intellectual traits, such 
as emotion, will, or moral character. Our methods of meas- 
uring general intelligence are also better developed than are 
our methods of measuring most of these other traits. Our 
knowledge is sufficient, however, to indicate clearly that 
differences in the other traits are of sufficient importance to 
merit serious consideration. Where we cannot as yet meas- 
ure them accurately, we should estimate them to the best of 
our ability. 

Various particular intellectual capacities are to a consid- 
erable extent specialized. A pupil may have high general 
intelligence and yet may be poor in ability to do manual 
work, or may be very deficient in musical capacity. We 
cannot classify pupils in these subjects merely upon the 
basis of a general intelligence test. Furthermore, the pupil 
may be comparatively low in general intelligence and yet 
may have unusually high capacity in some specialized direc- 
tion. This high capacity may constitute the pupil’s chief 
educational and vocational opportunity. To overlook it and 
to fail to give the pupil the appropriate training would be a 
serious blunder on the part of the school. 

Again, traits of character, temperament, or will are im- 
portant factors in determining the pupil’s success in school 
or in life, and require both recognition and training. A 
pupil of very low intelligence cannot, by the exercise of any 
amount of resolution or energy, raise himself above a medi« 
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ocre level of academic accomplishment, though he may do a 
much better grade of work than the average of his intelli- 
gence group. A person of high intelligence, on the other 
hand, may fail utterly in achievement because of an unstable 
emotional life and a poor adjustment to his social environ- 
ment. 

Besides these individual differences in mental traits, other 
differences frequently affect the pupil’s school work. His 
physical condition may impair his capacity for work. His 
home environment or his childhood associates may be either 
favorable or unfavorable to the development of intellectual 
interests and to consistent achievement. The rate of physi- 
cal growth probably has some bearing upon the child’s 
mental development, and upon his social attitudes. Just 
how important this factor is we do not know. It probably 
has some influence in determining the group with which a 
child can associate upon equal terms. 


8. Correlation between mental tests and other measures of 
capacity or achievement 


We have already seen that intelligence tests correlate 
fairly closely with one another, and that it is largely because 
of this that we judge them to measure general mental capac- 
ity or intelligence. Two general intelligence tests, if given 
to a class of fifty to one hundred pupils, usually give scores 
which correlate with one another from .70 to .80. This 
indicates that they are measuring something, and that this 
something is common to the two tests. 

The next question is, Does this something which is meas- 
ured by this test agree with what we ordinarily mean when 
we use the term brightness or intelligence? Does the test 
agree with our judgment of intelligence? In general, there 
is agreement, but we find great variation in the correlation 
between tests and the judgment of various individuals. 
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The judgments of some persons have very low correlation, 
and the judgments of others have comparatively high 
correlation. The correlation between teachers’ judgment 
and mental tests may run from as low as .30 to as high as 
.60, or even higher. Some persons have a clear idea about 
what is meant by general intelligence, and are good judges 
of it; other persons either have a vague idea, or are poor 
judges of individuals. On the whole, the tests, being more 
consistent, are to be relied upon more implicitly than are 
judgments. The variations which are found between the 
various cases in which mental tests are correlated with judg- 
ment may be explained in large part by the differences in 
the training or ability of the judges. 

When we come to the correlation between mental tests 
and educational achievement, the measures with which we 
are dealing are more objective. Several illustrations will 
serve to bring before us the typical facts. Correlations 
between individual mental tests and group tests with com- 
posites of educational achievement are reported by Gates.' 
It will be seen that mental age, as measured by the Stanford 
Revision of the Binet Scale, correlates about as closely with 
achievement as do scores in the verbal group test. The non- 
verbal group tests, however, correlate less closely with 
achievement. ‘This is perhaps due to the fact that school 
achievement is based more upon verbal than upon non- 
verbal type of performance. ‘The very low correlations of 
the non-verbal test with achievement in the higher grades is 
probably due to the fact that only one group test was used 
in these grades, and that one a test which has shown uni- 
formly low correlation with school achievement. In gen- 
eral, we may say that the correlation of intelligence tests 


1 Arthur I. Gates. ‘‘Correlation of Achievement in School Subjects 
with Intelligence Tests, and Other Variables”; in Journal of Educational 
Psychology, vol. 13, pp. 277-85. 1922. 
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Taste XXVI. Tue Correnation or INTELLIGENCE TESTS AND 
ScHooLt ACHIEVEMENT 


Achievement eae nen Achievement 
with Mental wath Ve aon with Non- 
Age era" | verbal Group 


(Stanford) Group Tests Tests 


0 
0. 
0. 
0 
0 
0 


with composite school achievement in the elementary 
school, as shown in this investigation, is in the neighbor- 
hood of .50. An illustration of the results of the application 
of tests in the high school may be found in a report by 
Proctor.! Proctor found the correlation between the I.Q. 
and the composite of school marks to be .545. 

Something of the variation in the correlation found in 
different institutions by different investigators may be 
gathered from typical statistics from the college field. At 
Yale, Anderson applied the Army Alpha test to four hundred 
freshmen, and found the correlation between composite 
standing and the test to be .377. The correlations in Table 
XXVII are reported by a committee of the faculty of Stan- 
ford University under the chairmanship of Terman.? 

Jordan, at the University of Arkansas, who gave the Army 
Alpha to three hundred and fifteen college students, reports 


1W. M. Proctor. ‘Tests in Educational Guidance”; in School and 
. Society, vol. 8, pp. 473-78, and 502-09. 1918. 

27, M. Terman ef al. Report of Sub-Committee on Scholarship on 
Student Ability. Stanford University, 1923. 
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Taste XXVII. Tur CorrELATION BETWEEN ACHIEVEMENT 
In COLLEGE AND INTELLIGENCE TESTS 


Stanford University 


275 freshmen men, scholarship for three quarters................ 54 
53 freshmen women, scholarship for three quarters.............. 63 
677 freshmen men, scholarship for three quarters................ .49 
204 freshmen men, scholarship for six quarters.................. 48 
30 freshmen women, scholarship for six quarters...............- .67 
138 transfer men, scholarship for six quarters................... 42 
35 transfer women, scholarship for six quarters...............-- 49 
Columbia, 199 New Plan men, scholarship for 1 semester........... .60 
Columbia, 111 New Plan men, scholarship for 2 years.............. . 67 
Columbia, 122 Old Plan men, scholarship for 2 years............... .50 
Mills;1157/ women; scholarshiprior 1 year.) secemes cece eee .70 
Brown;.300:men, scholarship for 1 year)(?)5.... 2... otek eee . 60 
University of California, 273 men and women, scholarship for 1 year (?) .47 
Goucher College, 243 women, scholarship for 1 year (?)............ .60 
Trenton Normal School, women, scholarship for 1 semester......... .56 


University of Pittsburgh, 569, both sexes, scholarship for 1 semester. .51 


a correlation with college standing of .485. Wood reports a 
correlation between the Thorndike College Entrance In- 
telligence Examination and the two-year scholarship score 
of .594. He reports that the Thorndike examination corre- 
lates with points earned by one hundred and six students 
454. Colvin reports that in the case of two hundred stu- 
dents in Brown University the correlation between the 
Colvin test and college grades was .60. 

These correlations are typical. The correlation between 
intelligence tests and composite standing of the pupils may 
be said, then, to lie usually between .40 and .60. Probably, 
in the majority of cases the correlation will be found to be in 
the neighborhood of .50, but under very favorable conditions 
it may be somewhat above this. 

The practical meaning of this correlation is that it enables 
us with a moderate degree of accuracy to predict the grade 
of work which a student will do in school or college. Two 
questions confront us in an attempt to evaluate and apply 
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this fact. In the first place, how does the accuracy of pre- 
diction or the closeness of correlation of intelligence tests 
compare with the predictive value or the correlation of 
previous school work? Consider first the correlation be- 
tween average standing in high school and in college. The 
correlations which have been reported vary considerably. 
Wood reports, in three cases, a correlation between second- 
ary school marks and college scores of .262, .331, and .15. 
Thorndike,’ in an early study, reports correlation between 
college entrance examinations and marks in the four college 
years of .62, .50, .47 and .25 respectively. Dearborn, in his 
Wisconsin study, reports a correlation of .80.3 The very 
low correlations reported by Wood are probably due to the 
variation in the marking standards of different institutions. 
They would be very much raised if allowance were made for 
these variations, or if a common standard were used. The 
very high correlation by Dearborn is difficult to explain. 
It probably is not typical, however. We may assume about 
.50 as a typical correlation between high school standing 
and college standing under favorable circumstances. This 
means that standing in high school has about the same 
predictive value for college standing as have intelligence 
tests. 

The second question to be raised is, How are these corre- 
lations to be interpreted? The pupil’s standing in the in- 
telligence test and in school or college work may differ for 
two causes. In the first place, the two may depend upon 
different capacities. In the second place, the inaccuracy 
of the two measures may reduce the correlation between 


1Ben D. Wood. Measurement in Higher Education, pp. 85, 86. World 
Book Company, 1923. 

2. L. Thorndike. ‘An Empirical Study of College Entrance Exami- 
nations ’’; in Science, vol. 23, pp. 839-45. 1906. 

3.W.F.Dearborn. The Relative Standing of Pupils in the High School and 
College, p. 21. Bulletin 312 of the University of Wisconsin, 1909. 
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them. We have already seen that the correlation between 
two intelligence tests, which presumably measure about the 
same ability, is between .70 and .80, rarely going beyond the 
second figure. This represents roughly the correlation we 
get when inaccuracy is the chief disturbing factor. A com- 
parable figure for school marks may be found in the correla- 
tion between the grades of students in the freshman year and 
the sophomore year in college. This correlation is reported 
by Wood to be 72.!_ From this it appears that the composite 
of a year’s marks, at least at the college level, and presum- 
ably at the high school level, is almost as accurate as intelli- 
gence test scores. If intelligence tests and marks measure 
exactly the same thing, then, we should expect them to cor- 
relate with one another about as closely as the marks of one 
year correlate with the marks of another, or as one intelli- 
gence test correlates with another, namely, between .70 and 
.80. When correlations are lower than this, we may con- 
clude that the marks and the tests measure somewhat 
different capacities. Marks, for example, depend not only 
on intelligence, but also upon previous training, industry, 
and interest. The lower correlation between the high school 
and college marks than between marks of two college years 
may be ascribed either to the fact that high school work and 
college work, being of a somewhat different character, de- 
mand somewhat different abilities, or to the fact that the 
marking standards of different institutions vary so largely 
that there is larger error in comparing them than in compar- 
ing the marks of different courses in the same institution. 
Entrance examinations, it may be noted in passing, have 
about the same correlation with college grades as does the 
intelligence examination. 

In order that we may have before us in somewhat more 
concrete form than,is represented by the correlation co- 

1 Ben D. Wood. Op. cit., p. 133. 
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efficients the relationships between the intelligence test 
score, the marks on entrance examinations, and the average 
mark of previous school work on the one hand, with stand- 
ing in college on the other, three correlation tables are pre- 
sented. The illustrations are selected from high school and 
college work, because our data for this level are more com- 
plete than for the earlier grades. The relationship is similar, 
however, to the relationship between standing in the ele- 
mentary school and in the high school.! The entries of 
these correlation tables represent percentages rather than 
numbers of cases. They are so arranged that each column 
and each row add up to 100 per cent. The tables are to be 
interpreted thus: In the case of Table XXVIII A, 49 per 
cent of the pupils who are in the lowest quarter of the class 
in their standing in the intelligence test, represented by the 
first column to the left, are also in the lowest quarter in their 
standing in college, represented by the horizontal row at the 
bottom. ‘By running up the first column on the left, we find 
that 38 per cent who are in the lowest quarter of the intelli- 
gence test, are in the second quarter in their college stand- 
ing, 10 per cent are in the third quarter, and 2 per cent are 
in the top quarter. The percentages in the squares along 
the diagonal from the lower left to the upper right-hand 
corner represents students who stand in the same quarter 
according to the tests and to their college marks. 

Suppose, now, we were to use the intelligence test as a 
means of prediction and of classifying the students into 
sections. If four sections were formed, from 31 per cent to 
57 per cent of each group would be properly placed as judged 
by their marks, from 18 per cent to 38 per cent of each group 
would be in a section one removed from their proper place, 
from 6 per cent to 16 per cent would be in sections two re- 


1Cf. on this point J. A. Clement. Standardization of the Schools of 
Kansas. University of Chicago Press, 1912. 
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moved, and from 2 per cent to 8 per cent would be in sections 
three removed from the correct one. The correlation of .50, 
which was found in this instance, represents the degree of 
accuracy in prediction and in classification shown in this 
table, when the classification is made into four groups. If 
classification were made into three groups, the accuracy 
would be somewhat higher. This gives us an idea of the 
practical value of the intelligence test and of other means of 
predicting the pupil’s standing and of classifying him. 

The criterion which is here used is accuracy of prediction, 
but accuracy of prediction is not the only criterion to use. 
The intelligence test, by measuring somewhat different 
capacities from those which are measured by school marks, 
may give us a partial basis for analyzing and explaining a 
pupil’s achievement or failure. 


4. The value of mental tests as measures of particular factors 
in achrevement 


There is evidence that intelligence tests measure certain 
components of the abilities required in school work more 
than they do other components. This has been shown by 
analyses of the causes of failure of students. In many cases 
it is found that the failure is not caused by lack of intellec- 
tual ability but by other deficiencies. In such cases the in- 
telligence test enables us to determine whether or not the 
failure is due to intelligence deficiency or whether it is neces- 
sary to look to some other cause. Statistical evidence that 
the intelligence score is a measure of only one of the various 
components required in school achievement is found in a 
study by Pressey, which may be taken as an example.! 
Pressey studied 116 junior high school students with the 


1S. L. Pressey. ‘An Attempt to Measure the Comparative Importance 
of General Intelligence, and Certain Character Traits in Contributing to 
Success in School”; in Elementary School Journal, vol. 21, pp. 220-29. 1920. 
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purpose of finding out the factors in their school success. 
His method was to have the students rated by the teachers 
on health, school attitude and preparation, and intellectual 
ability, and then to correlate these various ratings with 
marks. A significant finding was the partial correlation be- 
tween marks and ability, on the one hand, and between 
marks and school attitude, on the other hand. ‘The partial 
correlation of ability with marks was .49, and of school atti- 
tude and marks was .43. This means that if the pupils were 
all equal in school attitude marks would correlate with abil- 
ity to the extent of .49. If they were all equal in ability, 
marks would correlate with school attitude to the extent of 
43. In other words, ability makes a contribution to school 
achievement independent of attitude, and attitude makes a 
contribution which is independent of ability. It is desirable 
to have a separate measure of each of them, in order that we 
may analyze a pupil’s performance and determine what 
contributes to his achievement or success. Intelligence 
tests are important, then, because they help make this 
analysis possible. 


Il. THE USES OF MENTAL TESTS IN SCHOOL 

1. The general intelligence level and its relation to achievement 

While not very much use has been made of the fact, we 
have clear evidence that there are noticeable differences 
between communities in the average standing of their chil- 
dren in intelligence tests. We may cite merely one example 
reported by Pintner.' Pintner gives the comparative rating 
of the children in a town in Ohio and one in Kansas. The 
median mental index of the children in the Ohio town is 40, 
which is ten below normal, while the median index of the 
Kansas children is 51, which is one above normal. Similar 


1R. Pintner. Intelligence Testing, p.239. New York: Henry Holt & Co., 
1923. 
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marked differences have been found between rural children 
as a group and city children as a group, and also between 
children in one section of a city and in another section of the 
same city. Such differences as these may be due partly to 
native or inborn differences in capacity, and partly to dif- 
ferences in early training, but in any case they represent 
differences in present capacity to do school work. They 
are therefore significant because they constitute one basis 
for the interpretation of the achievement of the children. 
The use of the average intelligence of the children of a 
community to interpret the results of achievement tests 
may be illustrated by an example. In a certain state the 
children of a group of cities were given an intelligence test 
and also the Woody Arithmetic Test. We have the scores 
of the children in comparison with the norms in both the in- 
telligence and the arithmetic tests. The facts are given in 


Table Xx UX: 


TABLE XXIX. Tue Scores or CHILDREN IN A Group oF CITIES 
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It is apparent from an inspection of the table that the chil- 
dren in these cities make scores upon the intelligence tests 
only slightly lower than the norm, though in the sixth and 
seventh grades the inferiority is slightly greater than in the 
third to the fifth grade. In the arithmetic test, the scores 
are practically equal to the norm in the third and fourth 
grades, but become inferior in the fifth grade and markedly 
inferior in the sixth and seventh grades. The proportional 
inferiority in the arithmetic test in the upper two grades is 
much greater than the inferiority in the intelligence tests. 
We may therefore conclude that the teaching in these upper 
two grades is less efficient than in the lower grades, or that 
there is less emphasis given to the subject, or that some other 
circumstance operates to lower the children’s achievement 
below what we should expect it to be. Comparisons of this 
sort may be used to locate the spot in the school system 
which needs special supervisory attention. 

It has become commonplace in the reports of school sur- 
veys to point out variations among schools, and among 
classes within the schools, in the achievement of children. 
Similar variations may also be found in the average intelli- 
gence rating of schools or of classes. It is not necessary to 
give illustrations, since the principle is similar to the one 
brought out in the preceding paragraph. It is quite evident 
that when variations in the achievements of children of a 
school or class are found, it will be very helpful in interpret- 
ing the causes of such variation to know the intelligence 
rating of the specified group. While there is danger in 
attempting to determine with too much exactness what the 
achievement of a group of children should be from their in- 
telligence scores; nevertheless, gross differences can readily 
be interpreted by help of them. 

It is possible by means of intelligence tests to secure facts 
which are of assistance to the principal, the supervisor, or 
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the superintendent in judging the work of individual teach- 
ers. Besides using the test scores to interpret the achieve- 
ment of the pupils under a teacher’s care, they may be used 
to estimate the accuracy of the teacher’s judgment of the 
abilities of pupils and to gain light upon the basis of the 
teacher’s marks. The teacher’s success in handling pupils 
will depend to a considerable extent upon hew accurately 
she judges their ability. We have seen that different teach- 
ers vary considerably in the accuracy of their judgment. 
To overestimate the capacity of a pupil will result in apply- 
ing undue pressure to him; to underestimate the ability of 
the pupil, on the other hand, may result in the failure to 
stimulate him to as good work as he can do. 


2. Administrative use of mental tests in dealing with 
indwidual pupils 

Enough has been said to indicate that the score on a 
mental test is rarely if ever to be taken as the sole basis for 
a decision regarding the pupil. Responsible psychologists 
and educators usually emphasize the fact that mental tests 
are only one means of judging the pupil. Dickson gives the 
following list of items as necessary in order to deal with a 
pupil intelligently:! (1) chronological age, (2) mental age, 
(3) intelligence quotient, (4) grade, (5) accomplishment in 
school work, (6) application or industry, (7) health, (8) home 
environment, (9) nationality and language difficulty, (10) 
special or unusual conditions bearing upon school success. 
The treatment of the individual pupil is always a complex 
problem. Mental tests furnish valuable aid to the solution 
of this problem, but they must always be interpreted in the 
light of all the facts which can be gathered about the pupil. 

Keeping this principle in mind, we may now list the vari- 


1V.E. Dickson. Mental Tests and the Classroom Teacher, p.99. Yonkers- 
on-Hudson: World Book Company, 1923. 
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ous ways in which mental tests may be used in the adminis- 
tration of the individual pupil. 


3. Mental tests as an aid in the determination of the right time 
to enter school 


It is a well-established fact that when pupils enter school, 
at the age of six, they are very differently equipped to do 
successfully the work of the first grade. Out of 76 children 
who were tested in the kindergarten by Dickson, 24 or 31.6 
per cent failed to make normal progress during the subse- 
quent two years.! Of 95 children who were tested in the low 
first grade, 45, or 47.4 per cent, failed of normal progress 
during the next two years. Of 90 children in the second half 
of the first year who were similarly tested, 60, or 66.7 per 
cent, failed in normal progress. Of these 261 children, 
however, only three of those who had an I.Q. above 110 in 
the test failed to make normal progress. Of the entire 129 
who failed of promotion at least once, 84 had an I.Q. below 
90, and only 32 had an I.Q. between 90 and 109. When it is 
remembered that only 20 per cent of children in general 
have an I.Q. below 90, the preponderance of the retarded 
children in this group is very significant. 

Several years ago Superintendent Saam, of Council Bluffs, 
made an experiment in which children were promoted from 
kindergarten to the first grade on the basis of their intelli- 
gence quotients.” The results of the experiment are reported 
in the following words: 


In an attempt to check up young children who are promoted 
into the first grade upon the basis of their high quotient, an oral 
reading test similar to the Gray Oral Reading Test was given by 


1V.E. Dickson. The Use of Menta! Tests in School Administration. Board 
of Education, Berkeley, California, 1922. 

2'Theodore Saam. ‘Intelligence Testing as an Aid in Supervision”; 
in Elementary School Journal, vol. 20, pp. 26-32. 1919. 
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the primary supervisor in January, 1919, to every child who had 
entered the first grade in September, 1918. There were 408 
students tested. Of the 408, 128, or 31 per cent were rated as 
superior readers in this test. Of these 408, 35 had been promoted 
to the first grade at five years of age, because they had a quotient 
of 115, or over. Of the 35 students with a quotient of 115, or over, 
22 or 63 per cent were rated as superior. If conclusions could be 
drawn from this one test, it would be safe to assume that children 
five years old with quotients of 115 or over would do the first-grade 
work better than the unselected six or seven-year-old children. 


It is evident that the bright younger children are capable 
of doing the work of the first grade even better than the 
average six-year-old child. It is further evident that the dull 
older children are incapable of doing successfully the work 
of the first grade or two as the grades are now constituted. 
Should the bright child be accelerated by being put ahead of 
those his own age in school, and the dull child be retarded, 
or should all the children of the same age be allowed to enter 
school together, and then the work be differentiated accord- 
ing to their capacity? This brings us to the question 
whether it is better to advance children of different capaci- 
ties through the school or through the curriculum at different 
rates of speed, or whether it is better to attempt to enrich 
the curriculum for the bright children, and give a simplified 
curriculum to the dull children, but carry them through it 
at the same rate. This is not the place for an exhaustive 
discussion of this administrative problem. We shall recur 
to the problem and attempt briefly to sum up the considera- 
tions for the two types of treatment. 


4. Classification into ability groups 
One method of treating children of different degrees of 
ability at the beginning of their school career, as has already 
been said, would be to allow them to enter when they have 
reached a given mental age. The assumption underlying 
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this procedure is that children who are equal in mental age 
are able to do the same character and quality of school work. 
This procedure would be the first step in the classification of 
pupils according to mental age. This classification might 
conceivably begin at the first grade and be carried forward 
throughout all the grades, and the proposal has been made 
that this be done. 

There are several difficulties with this classification by 
mental age, which we may call vertical classification, since 
it involves placing the children at a given point in the scale 
of mental development. There is first the practical diffi- 
culty that pupils do not, as a matter of fact, enter school at 
the same mental age. They enter at the same chronological 
age of six, and it is not likely that any wholesale modification 
of this practice will be adopted in the near future. Some 
pupils, then, start with a handicap, others with an advan- 
tage, and it is not possible to bridge the gap between them. 

This one fact alone seems to make necessary a horizontal 
classification of pupils according to their intelligence. This 
horizontal classification is based upon the intelligence quo- 
tient or some other measure of relative brightness, rather 
than upon mental age. It means dividing pupils in the 
first grade, or in succeeding grades, into groups. This pro- 
cedure is illustrated by organizing three groups of pupils, one 
containing the bright pupils, another the average pupils, and 
a third the slow pupils. 

The second and more fundamental reason why classifica- 
tion on the basis of the mental age may not be satisfactory is 
that, even if we should start pupils together in the first grade 
who have the same mental age, they would not remain equal 
in mental age. We saw in the chapter on mental develop- 
ment that, if we accept the ratings which are obtained with 
the Binet scale, children of the same mental age exhibit a 
wider spread in chronological age as they grow older. Chil- 
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dren of high I.Q. gain more rapidly in mental age than 
average children, and children of low LQ. gain less rapidly. 
Group tests indicate much less divergence in curves of 
mental age than the Binet scale, but they indicate that there 
is some divergence. 

The third and most serious objection is that a merely 
vertical classification places together in the same group 
children of rather widely divergent chronological ages and 
stages of physiological and social development. A differ- 
ence in chronological age of two years at entrance to school 
would be represented by a still greater difference after the 
children have been in school six years. It is generally 
agreed that a wide divergence in ages of children in the same 
grade is disadvantageous. 

The horizontal classification of children into groups of 
similar ability may be begun in the first grade, and con- 
tinued throughout the child’s schooling, or it may be begun 
at some later period. Such grouping has been most com- 
monly carried out in the high school. It has recently been 
tried out, however, in the lower grades of the elementary 
school. Probably the most extensive experiment of group- 
ing at this level, is the one in Detroit.! 

Homogeneous grouping may be used to provide oppor- 
tunity for proceeding at different rates of progress, or to 
provide enrichment for the bright pupils and a simplified 
curriculum for the slow ones. Segregation is a general 
administrative device which provides the opportunity for 
various sorts of adjustments in curriculum and method. 

The differentiation which consists in taking the brighter 
group through the curriculum more rapidly than the back- 
- ward group needs no explanation. It represents a type of 


1Charles S. Berry. ‘Classification by Tests of Intelligence of Ten 
Thousand First-Grade Pupils”’; in Journal of Educational Research, vol. 
6, pp. 185-203. 1922. 
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acceleration which does not have the disadvantages attend- 
ant on skipping grades. It gives the bright child more diffi- 
cult work than is encountered by children of less ability. 
It also brings the gifted child to the threshold of high school 
and college sooner than has been the accepted age." 

Many, perhaps most, educators and psychologists con- 
sider a qualitative adjustment preferable to this quantita- 
tive one. Instead of varying the rate of advancement they 
consider it better to enrich the course of study for the gifted 
child. This would seem to imply a corresponding impover- 
ishment of the curriculum for the backward child. It may 
be that an advantageous qualitative differentiation can be 
made. For the most part, however, the adjustments which 
have actually been made consist largely in giving the bright 
child the sort of work which is part of the regular course of 
study for a later grade. When it is not this it is often an 
improvement in method which would be suitable for chil- 
dren of all degrees of ability. It is worth while, in spite of 
the comparative lack of success up to the present, to in- 
vestigate further the possibility of making genuinely quali- 
tative adjustments. 

The acceleration of the gifted child so that he enters high 
school and college early is often objected to on the ground of 
possible harm to the youth from associating at that level 
with those who are more mature. ‘There is probably danger 
of maladjustment if the youth enters high school or college 
too young. Possibly two years below the usual age is the 
limit of safety in the ordinary case. But we must remember 
that there is a variation of as much as three or four years in 
physiological maturity, and also that intellectual equality 
constitutes part of the basis of association on a common 


1 For a fuller discussion of this method of adjustment, see Frank N. Free- 
man, ‘‘The Treatment of the Gifted Child in the Light of the Scientific 
Evidence”; in Elementary School Journal, vol. 24, pp. 652-61. 1924. 
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footing. If the gifted child can be accelerated two years or 
so without suffering social maladjustment it is a decided 
advantage, for it is such children who are proper candidates 
for professional training, and the reduction of age of entrance 
upon the professions would be very beneficial. 

Another type of reorganization which is made for the 
purpose of adjusting the work of the school to differences in 
ability is individual instruction. Instead of treating the 
group of relatively homogeneous ability as a progress unit, 
each individual is treated as a distinct unit. The plan is 
ordinarily confined to the “tool” subjects such as reading, 
writing, arithmetic, and spelling. According to the view 
of the chief advocates of individual instruction, however, 
mental tests are not of much value in predicting or control- 
ling rates of progress, since there is not much correlation be- 
tween intelligence scores and rates of progress, or indeed 
between rates of progress in the various subjects themselves.! 
While this is probably an exaggeration, it is true that mental 
tests are of less use in individual instruction than in ability 
grouping. We shall therefore not discuss individual in- 
struction further. 

Mental tests should not be used as a sole basis either for 
determining the age of admission of the child to school, or of 
classifying into ability groups. The various other facts 
about the child which have already been mentioned should 
be taken into account. If a child is unusually large for his 
age, this fact should weigh in favor of promotion, or of 
classification with an advanced group. If he is unusually 
small, this fact should weigh in the opposite direction. Good 

1See Carleton W. Washburne. ‘‘The Attainments of Gifted Children 
under Individual Instruction”; in The Twenty-Third Yearbook of the 
National Society for the Study of Education; Part I, The Education of Gifted 
Children, pp. 247-61. 1924. 


See also the Twenty-fourth Yearbook of the same society, Part II, 
Adapting the Schools to Individual Differences. 1925. 
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health, again, should weigh in favor of advancement, and 
poor health of holding back. Individual cases should be 
dealt with in the light of all the facts. 

A word should be said about the procedure to be followed 
in giving and interpreting a test for the purpose of classifica- 
tion or of promotion. A rough or basic classification may be 
made by means of one or more group tests. Two group tests 
are more valid than a single test, and where it is possible 
it is advisable to give two tests and take both scores into 
consideration. If the scores agree, they indicate that the 
rating of the child is fairly reliable. If they disagree, it is 
necessary to secure additional evidence before coming to a 
decision. The same may be said concerning the relation of 
a child’s score in an intelligence test to his standing in school 
work, in case a previous record is available. If the child’s 
intelligence rating agrees with his educational rating, the 
intelligence rating thereby receives some confirmation. On 
the other hand, if the two disagree, the two ratings may or 
may not be reliable. It is of course possible that the child 
is bright and lazy, or suffers from some handicap which im- 
pairs his school achievement. Before concluding that this 
is the case, however, we should make sure that the intelli- 
gence rating is an accurate one. This means that additional 
tests should be given in order to confirm the rating of the 
first test. Similar confirmation is desirable when the rating 
of the test disagrees with the judgment of the teacher con- 
cerning the pupil’s intelligence. 

It is desirable, furthermore, to check up the rating of the 
intelligence test in the case of children at the upper or lower 
extreme, and possibly also in the case of children at the 
border line between two groups. It is particularly desirable 
to be sure of the rating of a child who receives a very low 
score. A low score is more likely to be due to an error than 
a high score. A child may fail to do himself justice because 
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he does not understand the directions, because he is in poor 
physical condition, because he is emotionally overwrought 
by the test, or by some experience just previous to the test. 


5. The use of tests in selecting children for special classes 


There are two types of special classes for which children 
may be selected on account of their intellectual capacity. 
The one is the class for backward children, and the other the 
class for gifted children. Such special classes usually differ 
from the homogeneous groups which have been mentioned 
in that they represent more extreme differences. Special 
classes for backward children include not only those who are 
somewhat slower in learning than the average, but those who 
are so defective that they cannot, even with more time, 
master the ordinary curriculum of the school. The group 
of gifted children represents those who can either proceed 
at a very accelerated pace, or perhaps require a radically 
different type of treatment from the majority of children. 
These special classes may contain from five per cent to ten 
per cent of the children at either extreme. 

The use of tests in selecting children for these special. 
classes does not differ in principle from their use in selecting 
for other purposes. The chief difference is that somewhat 
greater care should be exercised, particularly in selecting 
children for the lower grade classes. In the majority of 
cases, to be sure, children at either extreme stand out more 
prominently and are more easily identified than those in the 
middle of the scale. The necessity of additional care rests, 
then, not upon the difficulty of identifying children, but upon 
the practical importance of avoiding mistakes in their selec- 
tion. Because of the desirability of accuracy in selection, 
and because of the smaller numbers which are concerned, the 
final selection should probably be made upon the basis of 
individual examination. 
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6. The use of tests in educational guidance 


When the pupil arrives at the point at which election 
among courses of study or subjects is possible, the advisor 
of a student may use intelligence tests to good advantage in 
his guidance. An illustration of the success of advice made 
with the help of tests is given in an experiment by Proctor.’ 
Proctor gave intelligence tests to a group of eighth-grade 
pupils about to enter high school. The type of advice which 
was then given may be illustrated from two cases. 


Carp No. 3 
Roe, Richard Chronological age: 14 yrs., 4 mos. 
Score Army Scale — 150 Stanford-Binet mental age: 16 
Army Scale mental age: yrs., 9 mos. i 
17 yrs., 4 mos. 
Army Scale I.Q. — 120 Stanford-Binet I.Q. — 117 
High school subjects which Educational plans: To finish 
pupil desires to take: high school then attend a 
English university or the U.S. Naval 
History Academy. Vocational am- 
Algebra bition: Chemical engineer or 


French naval officer. 


Grade of work done in elementary and intermediate schools: 


Very poor. Estimated as “average”? by some grade teachers, and 
as “below average” by others. 


Comment of examiner: Boy has ability but needs to be waked up. 
Suggest that he take general science in place of history for first 
year. Also suggest that he be placed in first division in algebra 
where he will have to work. He will need to develop ability in 


both science and mathematics if he is to follow his vocational 
ambition. 


1'W.M. Proctor. “Psychological Tests and Educational Guidance”; in 
Journal of Educational Research, vol. 1, pp. 369-81. 1920. 


’ 
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Carp No. 4 
Brown, Carrie Chronological age: 15 yrs., 7 
Score Army Scale — 100 mos. 
Army Seale mental age: 14 Stanford-Binet mental age: 14 
yrs., 0 mo. yrs., 2 mos. 
ee Scale I.Q. — 89 Stanford-Binet I. me — 90 
High school ices which Educational pau Go to Mills 
pupil desires to take: College 
English 
Algebra Vocational ambition: To be a 
Latin Chemist 
Typing 
Drawing 


. . . . . . . . . . ° ° . 


Grade of work done in intermediate and grammar schools: Grades 
in 8A class only fair, even in work that is being repeated. Esti- 
mates of elementary and intermediate teachers: “slow” but a 
conscientious worker. 


Comment of Examiner: Should be discouraged as to taking Latin. 
Algebra doubtful, but if she insists in view of desire to go to col- 
lege assign to second division. 


The experiment justified itself in the larger retention of 
pupils in high school and in the reduction of failures, as is 
shown by Table XXX. Provided the precaution is taken 
of including other facts as determining factors besides the 


TABLE XXX. ComPaRATIVE Facts Recarpine “‘ GUIDED” AND 
““Uncuipep”’ Grours or Hicu-Scuoou PurPits 


Deore. [Os a es [HEY] pas | Fae as | Ha 
BOW work | cent cent . cent 
fer ject 


Guided.... 
Unguided. . 
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intelligence tests alone, and the counsel is given in the form 
of advice rather than compulsion, there are very large 
possibilities in the use of intelligence tests for this purpose. 

The next step in the counseling of pupils is to advise them 
regarding the selection of major courses of study, as con- 
trasted with individual subjects. As the various subjects 
differ among themselves in the demand which they make, so 
the courses differ. It has been demonstrated that the so- 
called vocational or commercial courses in the high school 
demand less general intellectual capacity, or less of the 
capacity which is measured by our tests, than do the aca- 
demic or college preparatory courses. This is shown by the 
fact that the students in the vocational courses have a con- 
siderably lower standing than those in the other courses. 
Furthermore, there is less correlation between the standing 
in commercial or vocational work and the scores on the tests 
than between the standing in academic work and the scores 
on these tests. The tests may be used in exactly the same 
way in advising pupils which course to take, then, as in ad- 
vising them which subject to take. 

Another phase of educational guidance has to do with the 
length of time a child shall remain in school. There is, in 
fact, as will be shown in Chapter XVII, a close correlation 
between the amount of schooling a child receives and his 
intelligence, or between the age of dropping out of school 
and intelligence. The school at present acts in a measure as 
a selective agency. The brighter pupils remain longer, and 
the duller ones drop out sooner. This correlation is un- 
doubtedly due in part to the fact that the larger amount of 
schooling causes a higher score in the mental test. After we 
have made due allowance for this fact, however, there re- 
mains a considerable degree of correspondence due to the 
fact that the higher levels of the school demand more in- 
telligence than the lower ones. 
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This fact has been taken by some as a criticism of the 
school. The school, according to these critics, should fur- 
nish a type of training at every level up to and including the 
college which is adapted to every range of ability. This 
would involve the addition to the types of training given in 
the college, and even in the high school, of work which is 
more largely manual in nature and which demands less 
abstract thinking. How far it is the function of the high 
school and the college to add to the present type of work 
courses of this character is a matter of broad educational 
policy which is not our problem here to decide. It is the 
opinion of the writer that there is a distinct limitation upon 
the desirable extension of full time high school and college 
work in this direction. It is probably desirable that there 
should be a very large extension of part time and continua- 
tion work, extending even to adult education, in order that 
those who are not fitted to continue indefinitely the full 
work of these higher institutions may add to their elementary 
school training further education suited to their capacity, 
so as to fit them to perform the duties of citizenship and to 
develop habits of making a wholesome use of their oppor- 
tunities for recreation. Continuation training should also 
serve to improve vocational fitness, including home making 
on the part of women. 

If this conception of the function of secondary and higher 
education is accepted, a distinction will have to be made 
between individuals, based upon their fitness to continue 
their education to the higher levels of the high school and 
college. Even if the high school and college should ulti- 
mately be reorganized so as to provide full time training to 
suit the capacities of everybody, they are not so organized at 
the present time and it is now necessary to make a distinc- 
tion. Some individuals are capable of continuing through 
high school and through college, while others are not. In 
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addition to the evidence from the correlation between the 
amount of schooling and intelligence we may cite illustrative 
findings on the relation between intelligence tests and the 
probability of success in high school and college. Terman 
estimates that an I.Q. of 90 is necessary for successful high 
school work, and an I.Q. of 100 for successful college work. 
Standards of this sort may be taken for general guidance, 
and a student may be advised as in the case of the selection 
of subjects or of courses. In all cases, of course, the charac- 
ter of the student’s previous work in school is to be taken 
into account as well as his intelligence test score. 

In the case of the college student, the question arises 
whether he shall continue beyond college into the profes- 
sional school, and if so, which profession he shall select. 
It is coming to be more commonly necessary to make this 
decision before the end of the college course. At least it is 
advantageous to the student to do so, because of the fact 
that he can, in his last two years of college, take courses 
which are preparatory to his professional studies. It is not 
clear that professional schools as a whole require a higher 
degree of intelligence than does the liberal arts college. In 
fact, the students in certain professional departments ap- 
pear to have a lower average standing than do the liberal arts 
college students. It may be that the chief form of guidance 
at the college level should consist in advising students which 
type of professional preparation to select, in case they wish 
to go forward into a professional school. The data which 
have been gathered in two universities and which are shown 
in Table XX XI seem to give some color to this suggestion. 

The type of educational guidance which has been described 
leads naturally to vocational guidance, and, in fact, con- 
stitutes the preparatory stages of vocational guidance. A 
discussion of the further use of tests in more direct vocational 
guidance, whether in or outside the school, will be reserved 
for a separate chapter. 
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Tasie XXXI. Sranpine In THE Army ALPHA TEST IN 
DIFFERENT SCHOOLS WITHIN A UNIVERSITY 


UNIVERSITY ScHOOL OR MepIAn 
DEPARTMENT ScorgE 


Arts, Commerce and 


Journalism 147 
Medicine 142 
Law 142 
| Engineering - 141 
Ohio State University !< Education 137 
Agriculture 133 
Arts 133 
Pharmacy 125 
Dentistry 115 
Veterinary Medicine 2 
[ Graduate School 154 
Literature, Arts and 
University of Illinois ? ite g obi 
Commerce 143 
Agriculture 139 


1E. L. Noble and G. F. Arps, “University Students’ Intelligence Ratings According to 
Army Alpha’’; in School and Society, vol. 11, pp. 233-37. 1920. 
2 Yoakum and Yerkes, Army Mental Tests, p.17. New York: Henry Holt & Co., 1920. 


7. The use of tests in maintaining the adjustment of a pupil to 
his work 

The necessity of adjustment is most evident in the case 
of the pupil who fails in one or more of his subjects. Tests 
have been used in such cases to assist in the analysis of the 
failure and in the determination of the type of the remedial 
work or other treatment which is necessary to overcome the 
failure and to prevent its return. If the pupil’s failure is due 
to lack of ability, this will be revealed by the test. If the 
test does not show lack of ability, the cause must be looked 
for elsewhere. In some cases, as, for example, in reading, 
the child may possess sufficient general ability, but may have 
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a special disability. In such cases it is necessary to apply 
special ability tests as far as they may be available. Special 
ability tests are most highly developed in the field of music. 
In the case of other subjects it is necessary to determine 
whether the child possesses special disability by means of 
tests of achievement in the subjects themselves. If the 
child’s general ability is normal, but he fails in particular 
subjects, we should first exhaust the possibility that this 
failure is due to lack of interest or to ineffective training at 
some point in the child’s previous school career. Only when 
the failure cannot be explained on one of these grounds 
should we resort to the explanation of a special disability. 
Bronner discusses this whole question of special ability and 
disability, and suggests the possible use of tests to discover 
them.!. The teacher or the supervisor will not find that our 
present tests, however, are of very much value for this 
purpose. 

If the cause of the failure has been determined to be lack 
of previous training or a special disability, remedial treat- 
ment is required. In the first case the remedial treatment is 
comparatively simple, since it merely consists of giving the 
child the training which he has missed in his previous career, 
or which has been inadequately given. Nothing new in the 
nature of training is necessary. If a special disability is to 
be overcome, on the other hand, considerable ingenuity must 
be exercised in order to find the type of training which will 
meet the peculiar necessities of the case. The subject in 
which the largest amount of work in the training of children 
with special disabilities has been done is reading. For a 
further account of the methods to be used the reader may be 
referred to the monograph on the subject by W. S. Gray. 


1A.F. Bronner. Psychology of Special Abilities and Disabilities. Boston: 
Little, Brown & Co,, 1917. 


2W.S. Gray. Remedial Cases in Reading: Their Diagnosis and Treatment. 
University of Chicago, Supplementary Educational Monograph. 1922. 
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Failure must always be interpreted in a comparative 
sense. The child who receives passing grades in his work 
may be failing as much as the child who receives a grade 
below passing. Failure is to be considered in relation to the 
pupil’s capacity. The very bright pupil who is doing medi- 
ocre work is failing in so far as his achievement falls below 
his capacity. Mental tests may therefore be used to stimu- 
late brighter pupils to work up to their capacity. In some 
cases, pupils present problems in conduct because the work 
which they are doing is so far below their capacity that it 
does not enlist their interest or engage their energy. Such 
pupils have sometimes become well behaved by being pro- 
moted and given work which was more nearly commensurate 
with their capacity. 

If this use of mental tests to stimulate the bright pupils 
is to be successful in the long run, the pupil’s willing codpera- 
tion must be secured and the work of the school must not be 
regarded by him as a task. The opportunity to do a high 
grade of work must be looked upon by him rather as a privi- 
lege than as an urgent requirement. Otherwise there is 
danger that, as the pupils become familiar with the uses 
which are made of tests, they will malinger, and the test 
score will fail to represent their true capacity. 


8. The selection of applicants for college or professional school 

As our educational system is now organized, admission is 
not restricted below the level of the college or the professional 
school except in the requirement of graduation from the next 
lower grade. In the college experiments have been made in 
recent years for the purpose of gathering information to 
indicate whether intelligence tests may be used as an ap- 
propriate means for selecting candidates for admission. 
Tests are not yet widely used for this purpose, although they 
are applied in a number of institutions as a matter of record 
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and for purposes of later administrative dealing with the 
student. The most prominent institution which uses in- 
telligence tests for admission is Columbia University. 
According to the new plan, inaugurated in 1919, four types of 
evidence must be presented by the candidate. These relate, 
first, to preparation; second, to character; third, to health; 
and fourth, to intelligence. If the student is satisfactory in 
the first three respects, he may at his own application sub- 
stitute for the ordinary entrance examination the intelligence 
test. Students who enter by this plan are regarded as 
among the most satisfactory of those in the college. The 
details of the experiences of Columbia with intelligence tests 


are described by Wood.! 
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CHAPTER XV 


THE APPLICATION OF MENTAL TESTS TO VOCATIONAL 
GUIDANCE AND SELECTION 


Tus chapter will aim to give only a general summary of the 
types of application of mental tests to vocational guidance 
and selection. It does not purport to be a complete account 
of the detailed results of the experiments which have been 
made in this field. The reader who is interested in such a 
detailed summary may consult the reviews by Kornhauser, 
Kornhauser and Kingsbury, and Muscio.!. These reviews 
contain full bibliographies and mention of the chief results 
which have been found by the application of the various 
types of tests. 


1. Vocational uses of tests; from the point of view of 
the employer 


The chief use to which the employer puts psychological 
tests is in connection with the selection of employees. This 
is a much simpler process than vocational guidance. It is 
necessary only to know the qualifications which are required 
for the particular job or jobs, and then to have tests which 
furnish reliable measures of these qualifications. These 
qualifications may be either of a general nature and require 
general tests, or they may be of a more specialized nature 
and require tests of special capacity. 


1 Arthur W. Kornhauser. ‘The Psychology of Vocational Selection”’; 
in Psychological Bulletin, no. 19, pp. 192-299. 1922. 

Arthur W. Kornhauser and Forrest A. Kingsbury. Psychological Tests in 
Business. University of Chicago Press, 1924. 

B. Muscio. Vocational Guidance. (A Review of the Literature.) Report 
no. 12 of the Industrial Fatigue Research Board. London: H.M. Stationery 
Office, 1921. F 
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A somewhat similar problem which confronts the em- 
ployer is the promotion of the employee. Unless he is 
promoted solely on the ground of seniority, mental tests 
may be used as a partial basis for promotion. They should 
be subordinate to production records, since achievement 
is the fundamental basis of promotion. There may be some 
cases, however, in which the individual may be fairly success- 
ful on a plane of fairly simple work, but may not possess 
capacity necessary for promotion to work of a higher grade. 
The mental test may serve to indicate the remote possibili- 
ties in the individual’s career. 

Mental tests are perhaps still more suited to a slightly 
different purpose. Employees are frequently shifted from 
one department to another or from one type of work to 
another within an institution. The lines of promotion 
' with'n business organizations are not worked out with very 
much system or completeness. It is quite possible, there- 
fore, that an employee may not be particularly suited to a 
performance of one piece of work, but may be well suited to 
another kind of work of the same organization. Tests have 
been used as a means of finding this out. 


2. From the point of view of the individual 

The application of mental tests to the individual, in 
order to determine what vocation he is best fitted for, is a 
much more complex affair than the selection of employees 
for a given position. Complete guidance requires that we 
have an inventory of all the individual’s capacities which 
may be of significance for success in the vocations, and also 
that we have tests which are suitable to measure the capac- 
ities necessary in the different vocations. 

Before we have the facilities for making the complete 
diagnosis and giving the complete guidance already men- 
tioned, it is possible to furnish guidance of a much more 
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limited sort with a more restricted collection of tests. For 
example, suppose that an individual has interest in and an 
appreciation for music. Such a person may wish to study 
music, but may hesitate because he is uncertain whether he 
possesses the capacity necessary to make a success in this 
field. In such a case the mental test of musical ability may 
serve to assist the individual in making a decision. If the 
test is favorable, the prospective musician may undertake 
training with some confidence. If it is unfavorable, he may 
turn to some other possibility. The choice of another vo- 
cation, of course, will be subject to the same uncertainty as 
would the choice of music if he had not had a test, but the 
objective determination of his fitness or unfitness for one of 
the possibilities which he may be considering at least re- 
duces the complexity of his problem. 

If it proves to be possible to chart the special abilities 
required by the most important vocations, and to develop a 
system of all-round tests for these abilities, it may ultimately 
be possible, by means of a single elaborate series of tests, to 
designate those occupations or classes of occupations for 
which the individual is fitted, and those for which he is 
unfitted. We are far from either having such an inventory 
or possessing such a system of tests, and some psychologists 
are skeptical concerning even the remote possibility of pro- 
viding this type of guidance. It is a type of guidance which 
is clearly thinkable, however, and psychologists are engaged 
upon research which looks in the direction of its realization. 

The forms of guidance already mentioned have to do 
with tests of special abilities. Tests of general ability, or 
tests of intelligence, may also be used for this purpose. If the 
assumption made by some vocational guidance experts, 
such as the late Mr. Weaver, of New York, is correct, oc- 
cupations may be classed into groups according to the 
amount of intelligence required for their successful prose- 
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cution. By the application of a general test, then, the group 
of occupations for which the individual’s intelligence fits 
him may be determined. According to the assumption which 
is customarily made, if a person gets into an occupation 
which requires a higher degree of intelligence than he 
possesses, he will fail to measure up to its requirements. 
If, on the other hand, he gets into an occupation which 
has requirements below his capacity, he will become discon- 
tented, and society will lose some part of his potential 
achievement. We shall attempt to estimate the possibilities 
of this and the other types of guidance somewhat more 
critically in a later section. 


8. The validation of vocational tests 


The validity of a test for the measurement of a particular 
vocational aptitude has in some cases been determined by 
the method of analysis. The person who devised the test 
attempted to analyze the ability required in the perform- 
ance of the vocation, and then invented methods of measur- 
ing this ability. In some cases, the inventor of the test has 
not even taken the trouble to correlate the standing of the 
test with achievement in the vocation. For example, 
Muensterberg published his test for ship pilots without hav- 
ing made any correlation between standing in the test and 
skill in the occupation. Little credence is now given to such 
subjective validation, and it is not necessary to dwell longer 
on this method. 

The only satisfactory method for validating vocational 
tests is to correlate them with some measure of performance 
in the vocation. This requires that we shall find some 
satisfactory measure of achievement in the vocation, and 
that we shall be able to apply this measure in such a way that 
achievement will represent aptitude or native capacity, 
rather than training or experience. Satisfactory measures 
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which meet these requirements are not easy to find. Per- 
haps the simplest procedure, when it can be applied, is to 
give the test which is to be validated to a group of individuals 
before they have entered upon their vocational career. 
These individuals are then followed and their success 
measured. ‘This success is then correlated with the stand- 
ing in the test. When the test is applied to individuals who 
have already had considerable experience in the vocation, 
it is necessary to allow for differences in amount of training 
or amount of experience in making the rating of their suc- 
cess. 

A few examples may be cited of methods which have been 
followed in gauging vocational success. The first of these 
uses as a measure of success a very general criterion, namely, 
general eminence in one’s profession as indicated by the 
presence of one’s name in Who’s Who. This method is not 
at all precise, since the presence of one’s name in Who’s Who 
depends to a considerable extent upon accidental circum- 
stances. In the field of science, a more reliable indication 
is the starring of one’s name in Cattell’s American Men of 
Science. The selection of the names to be starred is based 
upon the carefully canvassed judgment of the individual’s 
colleagues. Using either of these criteria, it will be neces- 
sary to take into account the time required to secure the 
prominence sufficient to warrant the inclusion of one’s name. . 
A person usually does not gain the reputation necessary to 
have his name upon either list within fifteen years after 
graduation from college. 

Other criteria which have been used are salary and posi- 
tion. These again are subject to accident and to differences 
in length of service. For example, in the field of education, 
college and university positions usually bring a somewhat 
lower salary than do administrative positions involving 
similar experience and training. Such accidents as these, or 
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others that might be mentioned, lower the correlation which 
would exist if the measure of vocational achievement were a 
perfect one. 

Within a given organization the rate of promotion may be 
taken roughly as a measure of success. In an organization 
in which promotion is very flexible this is a more delicate 
measure than in those in which seniority is accepted more 
largely as the basis for promotion. The advantage of this 
criterion is that it is objective and that it is based upon the 
record of the individuals over a period of time, and is there- 
fore not affected by differences in length of experience. The 
judgment of superior officers has sometimes been taken as 
the measure of success. If the judgments are made care- 
fully this is a significant measure, since promotion is based 
upon such judgment. Studies of the variability of rating, 
however, have indicated that they are subject to consider- 
able unreliability; hence, if a more objective measure can be 
found, it is to be preferred. 

A more objective measure which may be used in certain 
cases is the production record. This can more easily be 
secured in a mechanical operation than in higher clerical or 
executive positions. In some forms of factory production it 
can be applied with great exactness. The studies of Link, 
which are reported below, contain illustrations of the effec- 
tive use of this criterion. A similar criterion which has been 
used is the sales record of salesmen. 


Types of tests of vocational aptitude 
In this section the various types of tests will be briefly 
described. Only the outstanding types can be mentioned, 
and only one or two illustrations of each of these types can 
be specifically mentioned. In the succeeding section the result 
of the application of these tests will: be reviewed in connec- 
tion with the types of jobs in which they have been applied. 
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5. General intelligence tests, or measures of general ability 


Measures of general ability may be used in a very broad 
way as a basis for vocational guidance, or they may be used 
in a narrower fashion for selection for particular vocations. 
In the first instance we attempt to determine the intelligence 
requirements of various occupations, or groups of occupa- 
tions, and with this as a basis decide those occupations for 
which the individual is fitted by his general intelligence level. 
In the second instance we first establish an intelligence re- 
quirement for a given vocation, and then choose from among 
applicants those which meet this requirement and reject 
those who do not. A number of illustrations of the use of 
intelligence tests in selection will be given in the next section. 
We may consider the larger question of guidance. 

The use of intelligence tests in vocational guidance was 
made a practical issue largely by the application of mental 
tests in the army. A diagram is presented in the Army Re- 
port (page 829) which gives the median score on the Army 
Alpha and the range of the middle fifty per cent of men who 
were classified as belonging to various occupations. For 
typical differences see p. 452. Yoakum and Yerkes made 
the following comment upon this chart and its implication: 


In order of diminishing intelligence the occupational groups 
represented in Figure 24 may be classified thus: professions, clerical 
occupations, trades, partially skilled labor and unskilled labor. 
The greatest differences in intelligence required or exhibited by 
different occupations appear at the ends of the scale, whereas 
differences in the trained group are relatively slight. Further 
differences in range of intelligence for the various occupations are 
considerable and probably significant. The range in general 
diminishes from unskilled labor to the intellectually difficult pro- 
fessions for the obvious reason that whereas any individual may 
attempt tasks which require relatively little intelligence or edu- 


1 Yoakum and Yerkes. Army Mental Tests, p. 198. (Reproduced with 
permission of the publishers, Henry Holt & Co.) 
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cation, only able individuals can succeed in the learned professions. 
It is well worthy of remark that whereas a group of army laborers 
contains few individuals of high grade intelligence (A or B ratings) 
the group of engineering officers contains very few except high 
grade individuals. 

The concrete significance of general intelligence testing is 
difficult to describe. It is conceivable that some occupation will 
show a perfect degree of correspondence between score and success. 
If such an occupation were ever found the application of the test 
to candidates for positions in that occupation would be seen to be 
the best measure possible. No one expects to find such an occupa- 
tion. That correspondence between school success and the tests 
is relatively high is shown above. Clerical workers succeed in 
general in proportion to score; but many other factors are to be 
considered even in these cases of positive correlation. 


In judging the value of such facts as these for vocational 
guidance, we must make certain qualifications. In the first 
place, as is pointed out by the authors of the army tests, 
the particular average scores of the men of different occupa- 
tions in the army cannot be taken as indicating accurately 
the average scores in the same occupation in civil life. This 
is due to the fact that there was a different kind of selection 
in the army draft among men of different occupations. In 
the case of certain essential occupations men were exempted 
because of the fact that they were necessary to these occupa- 
tions. ‘These in general were undoubtedly the more intelli- 
gent men. ‘The men who were drafted from these occupa- 
tions, then, represented in general a lower level of intelli- 
gence than the total number of men in the occupations of 
civil life. : 

In the second place, we cannot without qualification 
compare the intelligence scores of men in widely different 
occupations as being accurate measures of their native in- 
tellectual capacity. Individuals in clerical occupations are 
undoubtedly better, trained to take the ordinary group in- 
telligence tests than are individuals in occupations which 
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are more largely manual and which deal with things rather 
than with symbols. The comparison by means of intelli- 
gence tests within a given occupation is more reliable, then, 
than a comparison between occupations. The effects of 
training is in all probability not great enough to vitiate all 
such comparisons, but they undoubtedly make it necessary 
to make an allowance in judging the difference in intelligence 
of men of different occupations. 

How, now, may intelligence tests be used in giving voca- 
tional advice? Their primary use is to determine whether 
an individual has the necessary intellectual capacity re- 
quired in a given occupation. To put it in another way, we 
may use the chart to determine in a very general way those 
occupations or groups of occupations for which the indi- 
vidual’s intelligence renders him capable. Assuming that 
the army table is reliable, an individual whose letter rating 
on the army test is B possesses the intelligence necessary to 
succeed in the group of occupations listed after the letter B, 
near the bottom of the table. If his intelligence rating is C, 
his capacity should be sufficient to enable him to succeed in 
the large group of occupations listed after this letter. It 
should be kept in mind, however, that there is a large over- 
lapping of the distribution of intelligence among the various 
occupations, indicating that other qualifications in addition 
to intelligence determine the choice of a vocation, and that 
other qualifications are necessary for success. 

The statements made in the preceding paragraph are true 
only in so far as the general intelligence requirement is con- 
cerned. The facts shown in the chart do not give us any 
warrant for predicting a person’s success in any of the 
occupations which are listed. We can only say that an 
individual has the necessary qualifications in respect to intelli- 
gence. He may or may not possess other essential qualifica- 
tions. Intelligence tests, then, cannot be used without 
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further evidence to predict success. A certain minimum 
intellectual capacity may be required for success in an 
occupation, but other qualifications are essential also. In 
such cases, predictions of success must be based upon ad- 
ditional evidence as well as upon the intelligence tests. 

It will be noticed that large numbers of occupations pos- 
sess substantially the same intelligence requirements, or 
that the individuals in these occupations have substantially 
the same median intelligence. This does not necessarily 
mean that a person of a given intelligence could succeed 
equally well in these occupations. In order to judge of the 
individual’s probable success it is necessary to have a means 
of estimating or of measuring the other qualifications. The 
most that can be hoped of intelligence tests in vocational 
guidance is that we can say either that an individual probably 
does not possess the intelligence necessary in certain occupa- 
tions and therefore it would not be wise for him to enter upon 
them, or that he does possess the intelligence required in 
certain groups of occupations, and, if he possesses the other 
necessary qualifications, he may reasonably expect to succeed. 

The extent to which the intelligence alone may safely be 
used as a basis of prediction or guidance may be inferred 
from the correlation coefficient between the intelligence score 
and the success in the occupation. This correlation differs 
very widely. In the case of a group of 106 graphotype 
operators of the treasury department, for example, the cor- 
relation between output and Army Alpha score was —.087.! | 
Obviously while a certain minimum intelligence is probably 
necessary for success in this vocational activity, the posses- 
sion of intelligence beyond this minimum does not give any 
indication whatever of one’s degree of success in it. In the 
case of seventy-three employees of the civil service com- 
mission, on the other hand, the correlation between the 

1 Army Report, p. 837. 
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Army Alpha score and the civil service rating is .53. In this 
case, intelligence, although it is only one of the essential 
factors of success, gives some indication not only that an 
individual possesses the minimum intelligence necessary, 
but also what the degree of his success will be. 

The possibility of using general intelligence tests for vo- 
cational guidance may be summarized as follows. If the 
individual's intelligence rating is high, he may be encouraged 
to look toward a career in one of the professions, or in one of 
the more responsible positions in the field of business. If 
his rating is medium, he may look toward a clerical occupa- 
tion, or a skilled trade. If it is low, his sphere of useful 
activity will lie in the field of semi-skilled or unskilled labor. 
In any of the three cases success cannot be guaranteed by 
his intelligence rating. In the professions, or in business, 
for example, personal qualifications are as necessary as is 
intelligence. Possession of a requisite intelligence rating 
means only that an individual will not be handicapped by a 
lack in this sphere. This, of course, is well worth knowing, 
and its knowledge may prevent many failures. Besides 
preventing failures, the intelligence test may also point the 
individual to higher realms of achievement than he or his 
associates had contemplated. While these statements sug- 
gest only a modest use for intelligence tests in guidance, a 
much more ambitious program would be likely, so far as 
our present knowledge goes, to bring disaster. 


6. Tests or teams of tests selected by the empirical method 


The type of test which is referred to in this section is 
characterized more by the method of its selection than by 
the nature of the test itself. The test may measure an 
element in the ability required by the vocation, or it may 
measure the ability as a whole. The method of selecting 
these tests is to try them out by finding their correlation 
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with a measure of achievement in the vocation. Tests 
which give a good correlation are retained, and those which 
give a low correlation are rejected. In some cases the tests 
which are retained are combined into a team of tests. This 
procedure we call the empirical method of selection. 

It is not to be supposed that the selection of tests by the 
empirical method begins by a purely random search. The 
tests which are to be tried out are those which the psycholo- 
gist has some notion may prove to be successful. He has a 
more or less vague suspicion that the test measures one or 
another of the capacities required in the vocational activity. 
He does not, however, attempt to make a careful analysis 
of the capacity, on the one hand, or the test on the other. 
He relies chiefly upon the calculation of correlations to 
determine which tests are good. 

An illustration of this type of procedure, and of the tests 
which are selected by it, may be taken from the study of 
tests for telegraphers, by Jones.! Jones took a series of the 
tests which had been included in the Woolley-Fischer series 
and gave them to a number of telegraphers in a continuation 
school. He then calculated the correlation between each of 
the tests and the progress which was made by the students, 
and on the basis of this correlation determined which were 
satisfactory for the purpose of selection. Other illustrations 
of the application of this method, and the selection of this 
type of test, will be given in the discuss‘on of the use of tests 
in. connection with different types of jobs. 


7. The analysis of ability and the selection of a test or series of 
tests to measure the element or elements of the ability 


The procedure of the next type of test to be described 
1]. S. Jones, ‘The Woolley Tests Series Applied to the Detection of 


Ability in Telegraphy ”’;'in Journal of Educational Psychology, vol. 8, pp. 
27-34, 1917, 
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differs from the procedure described in the preceding section 
only in the degree of analysis or the carefulness of the analy- 
sis which is used in the select’on of the test. The tests of the 
preceding sections shade over almost imperceptibly into 
those mentioned in this section. 

The outstanding example of a series of tests selected by 
analysis and designed to measure the elementary capacities 
which make up a complex vocational activity is the series of 
music tests designed by Seashore.!_ Seashore’s series consists 
altogether of thirty-odd single tests. Their nature may be 
gathered from the mention of five of the more fundamental 
tests. These are first, a test of pitch discrimination; second, 
a test of the discrimination between the intensity of tones; 
third, a test of the recognition of the relation between time 
intervals; fourth, a test of the memory of tones; fifth, a test 
of discrimination between harmonious and inharmonious 
combinations of tones. Each of these abilities, along with 
a large number of others, is considered by the author 
of the tests to be an essential to musical performance. 
He has therefore included them in his scale, and has devised 
means of testing them. Seashore does not, however, give 
us correlation data concerning the relationship between 
capacity in these individual traits and musical performance in 
general. 


8. Complex aptitude tests 
In contrast with the series of tests designed to measure 
the elements of vocational capacity are a number of tests 
each of which has been designed to measure a complex 
aptitude as a whole. In the Seashore tests, for example, 
each individual test measures only one small aspect of the 
total capacity. The series must be taken as a whole in 


1C. E. Seashore. The Psychology of Musical Talent. Boston: Silver, 
Burdett & Co., 1919. 
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order to get a measure of the capacity as a whole. In these 
tests, however, the entire capacity, or a large part of it, 
is measured by a single test. The procedure which is 
followed in designing a test of this sort is first to analyze the 
aptitude or the capacity which is to be measured, and then, 
instead of designing a series of elementary tests, to design 
one complex test which will measure the whole performance. 
This test may be either of the sort which Hollingworth calls 
the “analogy test,” or the “miniature test.” In the case of 
the analogy test, an activity is required of the subjects which 
involve attitudes and performances which are similar, psy- 
chologically, to those required by the job, but in which the 
content or the apparatus is different. The miniature test 
involves a reproduction of the actual performance which 
is required in the vocation, on a small scale. 

We may take as an illustration of these complex tests the 
Muensterberg test for motor-men.! Muensterberg con- 
structed a chart which was intended to represent a street. 
The chart was divided into sections representing the street- 
car track in the middle, and other parts of the street at 
different distances from the track on the side. Numbers 
were inserted in the squares to represent pedestrians or 
different types of vehicles. The color of a number repre- 
sented the direction in which the object was going. The 
prospective employee moved over the chart a screen which 
contained an opening. As the parts of the sections of the 
chart were revealed by the opening the examinee was re- 
quired to indicate which of the objects represented by the 
numbers constituted sources of danger. The speed and 
accuracy of the performance both counted. 

A test of this general type was also devised by Dodge as a 
means of testing men for the activity of gun pointing in the 


1 Hugo Muensterberg. Psychology and Industrial Efficiency. Boston: 
Houghton Mifflin Company, 1913. 
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army.’ We do not have a detailed description of the 
apparatus which Dodge used, but we know that it consisted 
of a mechanism similar in its general operation to the gun 
and requiring an activity similar to gun pointing. Inciden- 
tally, it is interesting to note that Dodge’s apparatus was 
found to be serviceable as a means of training as well as a 
means of testing. This suggests possibilities in the use of 
tests for training which have thus far not been explored. 


9. The activity of the job as a test 

This test goes a step farther than the analogy or miniature 
test. Instead of using a procedure which is like that of the 
vocation, it uses the activity of the vocation itself. In 
testing for typewriters, for example, the individual may be 
given several lessons in typewriting and his progress meas- 
ured. This procedure assumes that the course of the very 
early part of the practice curve may be used as a means of 
predicting the rapidity of improvement in the later part of 
the practice. Another illustration of the use of the job itself 
may be taken from the tests for file clerks used by the United 
States Civil Service Commission.? The first test of the series 
illustrates the procedure. The individual is given a sheet 
upon which are printed fifty names. ‘To the right of each 
name are five names beginning with the same initial, and 
arranged in alphabetical order. The individual is to indicate 
between which of the pairs of the five names the one on the 
left hand should be filed. 

If we review the different types of tests which have been 
described, we see that they occupy a series varying from 

1R. M. Yerkes. “Report of the Psychological Committee of the 
National Research Council”; in Psychological Review, vol. 26, pp. 83-149. 
aT J. O’Rourke. Report of the Research Section, in the Annual Report 


of the Chief Examiner and the Director of Research of the United States Civil 
Service Commission. Washington: Government Printing Office, 1923. 
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those which are very general to those which are very 
specific. General intelligence tests aim not to measure the 
specific activities required in the vocation, but the general 
capacities which may be common to a large number of vo- 
cational activities. The empirical tests or the analytical 
tests aim to measure the elements which are required, but 
do not measure the total complex activity directly. The 
vocational complex tests aim to measure a performance 
which is similar to that required by the vocation, whereas 
the vocational samples aim to measure the vocational 
activity directly. 


10. The use of vocational tests for various types of jobs; 

routine operations in the factory 

The work of factory laborers has become so highly special- 
ized that the activities which are performed seem to offer a 
good opportunity for the use of mental tests for selection. 
In attempts to gauge the capacity of an individual to per- 
form successfully the work of a complex vocation, such as a 
profession or the job of an executive, only a general test or a 
long series of tests which together gives a composite of abili- 
ties could by any stretch of imagination be conceived as 
furnishing a satisfactory measure of capacity or of aptitude. 
In the case of minutely specialized jobs in the factory, how- 
ever, the requirement is much narrower, and the possibility 
of securing a satisfactory specific test is much greater. An 
illustration may serve to indicate the possibilities in this 
field. 

The most systematic and extensive study of the use of 
mental tests in the selection of factory employees is the one 
made by Link.! Link’s work was carried on during the War 
at a time when the pressure for efficiency was very great, and 


1H. C. Link. Employment Psychology. New York: The Macmillan 
Company, 1919, 
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high wages made it possible to carry on a fairly rigid process 
of selection. With the changed conditions since that time _ 
there has been less demand for this particular type of the 
application of vocational tests. Link’s work indicates, 
however, the possibility of selection of individuals for highly 
specialized tasks by means of mental tests. 

Link used the empirical method combined with analysis. 
The application of this may be illustrated in his test of shell 
inspectors and shell gaugers. The work of the inspectors 
was to examine shells and pick out those which were de- 
fective. The work of the gaugers was to apply a measur- 
ing instrument to the shells and reject those which were 
not of the correct size. Link tried out sixteen preliminary 
tests-on these two groups. As a consequence of the pre- 
liminary trial he made a more careful trial of seven tests 
and correlated the scores of the workers with the amount 
of their production. The correlation between the scores 
of each of the tests and their production in the case of the 
two groups is as follows: 


InsPectors GAuGERS 

(52) (51) 
Meee SAREE) foie pie ca acs pst oceise ule es .56 .05 
EERE rn oe cin os sisiatt action cet 14 12 
Cancenation: oe. 008.5 ea ses os oe eas 63 i lef 
MAASVIGHECHONS $s esis. Vince aT ee asd 14 18 
Number group checking............... 12 =e 
INCOMES £0 5 DRONA IORI OROC Rae .38 
SHICHYG THVSSE DZ 5 I Beata ee ree eS 24 


It will be seen that these two jobs require different abili- 
ties. The inspection requires accuracy of discrimination as 
well as quickness of response to the objects discriminated. 
Mere rate of movement, however, as measured by tapping, 
is not a satisfactory measure of the aptitude required of in- 
spectors. Rate of movement, on the other hand, is the main 
requirement for the gaugers, whereas for them discrimina- 
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tion is not important. When these tests were used in the 
intensive study of a group of employees, it was found that 
94 per cent of those who were successful passed in the tests. 
All of those who failed also failed in the tests. In the next 
place, those who passed the test continued in employment 
in the factory almost ten times as long as those who did not. 
The average of those who passed the test was 9.56 weeks, 
and of those who did not pass the test 1.05 weeks. Under 
certain conditions, then, tests may be expected greatly to 
reduce labor turnover. 

Other groups for whom Link devised tests were assemblers, 
clerks, typists and dictating machine operators, machine 
operators, apprentice tool-makers, and machinists. 


11. The operation of machines in a complex situation 

Running a motor car or flying are activities which are 
somewhat more complex than most factory operations. 
Like factory operations they involve chiefly bodily activities, 
and therefore appear to offer a good field for the application 
of tests. 

The experiment of Muensterberg in devising a test for 
street car motor-men has already been mentioned. Muens- 
terberg devised his test by the method of analysis. He gives 
no objective figures concerning the degree of its success in 
selecting employees, but the general statement is made that 
it was found useful. We are still in the dark, however, con- 
cerning the exact value of such a test as this. 

More elaborate and scientific studies have been of the 
aptitude for flying, and of the possibility of using tests to 
measure this aptitude. Three types or aspects of flying abil- 
ity for which mental tests were apy lied may be mentioned. 
The first is the responsiveness of the semi-circular canals, 
and the nervous mechanism which is connected with these 
canals, to rotation. The rotation test was widely used. 
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It consists of placing an individual in a chair, rotating it 
rapidly, suddenly stopping the chair, and then noting the 
time required for the person to regain his sense of balance 
and his ability to react to stationary objects. The assump- 
tion underlying the test was that if a person’s mechanism 
for balance is in satisfactory condition, he will remain dizzy 
for a longer period than if it is out of order. After this test 
had been rather extensively used, however, both in the 
United States and in European countries, it was found upon 
examination to have no diagnostic value. 

In the second place, mental tests were used to determine 
the degree of tolerance of the individual to a reduction of the 
oxygen content of the air. This was used as a measure of 
the altitude which the individual could reach without losing 
his mental control. The test was a complex reaction test. 
An individual was to respond to changes in any one of 
the three continuous stimuli. This proved to be sensitive 
measure, and is included here although the capacity which is 
measured is a physical rather than a psychological one. 

Finally we may mention the tests of flying aptitude itself. 
These tests were chiefly of the empirical type. A consider- 
able number of tests which might possibly measure the 
aptitude in question were applied to men in the school for 
aviators. The tests were then correlated individually with 
the rapidity of progress made by these individuals in learn- 
ing to fly. Finally, from those that gave the higher correla- 
tion, teams of tests were selected. These tests were not 
actually applied to the selection of aviators, due to the end- 
ing of the War, but it was calculated that if they had been 
applied considerable saving of time and energy would have 
been effected. Since the experiments reported by Henmon 
gave the most positive results, we may confine our illustra- 
tion to this study.' 


1V, A. C. Henmon. “Air Service Tests for Aptitude in Flying”; in 
Journal of Applied Psychology, vol. 3, pp. 103-09. 1919. 
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Eight tests were reported which had been selected from 
forty tried out in a preliminary experiment. The correla- 
tion between nine of these tests and flying ability is given 
as follows: 


(1) Emotional stability, measured by a reaction to a pistol shot 


a. hand reaction: asset Go ee 26 
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(6) Equilibrium choice reaction................ .08 
(7) Equilibrium, difference... .......0ve~ 20-5 —.15 
(8): Extensionjof curves. -)ces eee eee 14 
(O)s Mental alertness ache 2. seers eee ie eee 20 


The tenth test was a record of previous athletic achievement. 

The correlation was reported as high, but no reliable data 
were obtained. It will be seen that the tests giving the 
higher correlations are the first three and the mental alert- 
ness test. From this list a composite series or team was 
composed, and this gave a correlation with flying ability of 
.70. On the basis of this team of tests a prediction was made 
of the flying ability of fifty new cadets. Five of these were 
recommended on paper for discharge. Three were in fact 
discharged on account of poor flying ability after four, 
twenty and twenty-two hours of flying respectively. The 
fourth was commissioned after eighty-five hours, and the 
fifth after ninety-three, whereas the median time was sixty. 
Two men who showed special aptitude were rated very 
good in the test. 

A later report by Stratton and others gives lower corre- 
lations but this method of procedure gives possibilities of 
ultimate success.! 


'G, M. Stratton, H. G. McComas, J. E. Coover, and E. Bagby, “ Psycho- 
logical Tests for Selecting Aviators”; in Journal of Experimental Psychology, 
vol, 3, pp. 405-23. 
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12. Clerical and office work 


A variety of different types of work are included under 
this general head of clerical and office work. In some cases 
the work consists chiefly of the manipulation of machines, in 
other cases chiefly the use of language, while in still other 
cases it consists chiefly in dealing with persons. The various 
activities are classed together, not because they are pre- 
sumed to involve the same mental capacity, but because they 
commonly occur together in practice and are frequently 
carried out by the same person. 

We may begin with the humble messenger boy. Several 
tests have been given in the search for one which will serve 
as a measure of efficiency in the miscellaneous duties re- 
quired of such boys, and also in order to select those who are 
fitted for rapid promotion. One report upon tests for 
messenger boys is made by Jones.!. Jones used the em- 
pirical method and applied a number of tests, including that 
for immediate memory, an opposites test, and a sentence 
completion test. The average of six of the tests with the 
efficiency of messengers is .44, a rather modest correlation. 

The application of an alertness test to office boys is re- 
ported by Scott and Clothier.2, The average score of one 
hundred boys who were so tested was 38.7. The average 
score of those who were discharged was 28.1, of those who 
left for better positions, 44.7, and of the twenty-nine who 
were promoted to junior clerkships, 46.2. This illustrates 
the use of tests of office boys chiefly for the purpose of deter- 
mining promotion. 

A number of tests have been tried out for the purpose of 
diagnosing aptitude for learning typewriting. These tests 


1B. S. Jones. “The Woolley-Fischer Test of Telegraphers”’; in Journal of 
Educational Psychology, vol. 8, pp. 27-34. 1917. 

2 Walter D. Scott and Robert C. Clothier. Personnel Management. A. 
W. Shaw Company, 1923. 
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vary from general intelligence tests to rather highly special- 
ized tests of motor ability. The specialized type of tests is 
the more typical and we may mention two or three by way of 
illustration. Rogers reports a moderate correlation between 
a number of tests and typewriting ability... These tests 
include number-checking, color-naming, action-agent, verb- 
object, agent-action, and form-recognition. The correla- 
tions of the individual tests vary from .248 to .438. Rogers 
also combined the individual tests into teams and found the 
highest correlation between the combination of the color- 
naming, the verb-object, and the number-checking test. 
Hollingworth and Poffenberger report a somewhat higher 
correlation from similar tests.? 

An analytical and specialized aptitude test was tried out 
by Brewington.’ Miss Brewington found the best results 
from a serial reaction test, which was carried out by means 
of atypewriter. The test required that the individual press 
a key as a number correspending to it appeared behind an 
opening in a screen. When a key was pressed it brought 
into view a new number. The test was tried out with two 
classes before they had had typewriting instruction, and 
their score was correlated with the progress which they 
made under instruction. The correlations ranged from .59 
to .73. ‘These correlations are unusually high for a single 
aptitude test. 

A number of tests have been given to prospective tele- 
graphers. Among these may be mentioned, by way of 
illustration, the tests by Jones and Thurstone.* Jones gave 


1H. W. Rogers. ‘Psychological Tests for Stenographers and Type- 
writers”; in Journal of Applied Psychology, vol. 1, pp. 268-74. 1917. 

2 H. L. Hollingworth, and A. T. Poffenberger. Applied Psychology. New 
York: D. Appleton & Co., 1917. 

Ann Brewington. American Shorthand Teacher, vol. 4, September, 
1923, p. 1. and October, 1923, p. 50. 

4.8. Jones. Op. cit. 

L. L. Thurstone. ‘“‘Mental Tests for Prospective Telegraphers”; in 
Journal of Applied Psychology, vol. 3, pp. 110-17. 1919. 
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a variety of tests to twenty-two boys who were studying 
telegraphy in a continuation school. The scores on the 
tests were correlated with the ranking of the progress made 
by the boys in their practice. The tests were similar to those 
which he gave to messenger boys. They consisted of a 
number of the tests which were contained in the Woolley- 
Fischer series, and constitute an empirical selection. The 
correlation of the composite of the tests with success in 
telegraphy was .50, and the correlation of a selection of six 
of the tests ranged from .60 to .80. These correlations, as 
will be seen, are relatively high. 

Thurstone gave eight tests, selected partly with the view 
of their probable service as diagnostic tests, to 165 students 
in a school in radio telegraphy. A somewhat specialized 
test, designed for the purpose, was a rhythm test, requiring 
that the individual indicate on paper the time pattern of a 
series of sounds given on a buzzer. This correlated with 
success in practice .48. The other tests, which were familiar 
components of general intelligence tests, correlated to a 
lower degree, ranging from .42 down to .21. Three school 
subject tests correlated to a still lower degree. The com- 
bined score on the tests correlated .53 with achievement. 

As a final type we may mention a general clerical exam- 
ination. The first widely used general clerical examination 
was devised by Thurstone.t Thurstone’s examination was 
in the general form which has been adopted for intelligence 
tests. It contained a mixture of material commonly used 
in intelligence tests, and material which demanded specific 
knowledge or ability required of clerks. For example: 
Test A consisted of a series of addition problems, and the 
subject was required to check the ones which were erroneous. 
Test B requires that misspelled words be indicated in a pas- 


11, L. Thurstone. ‘Standardized Tests for Office Clerks”; in Journal of 
Applied Psychology, vol. 3, pp. 248-51. 1919. 
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sage of connected discourse. Test C is the familiar can- 
cellation test. Test D is a substitution test. Test E re- 
quires that series of names of cities and of persons be put in 
alphabetical order and classified. The last test is the Match- 
ing Proverbs Test which Thurstone had originally used in his 
intelligence scale. The assumption underlying this test 
is that the ability required of clerks is substantially what 
we usually measure in our intelligence tests. In order that 
the test may appeal to the clerk or the prospective clerk as 
measuring his ability in his own job, however, the materials 
are, so far as possible, dressed in the disguise of ordinary 
clerical operations. 

A second illustration of a very similar test is taken from 
the report by O’Rourke of the general clerical examination 
devised for the United States Civil Service Commission.! 
Test 1 requires that the subject indicate the correctness or 
incorrectness of a series of geographical statements. Test 
2 requires the recognition of the similarity or dissimilarity of 
a series of names. Test 3 requires that a series of words be 
put in alphabetical order. Test 4 is an opposite test. Test 
5 is an analogy test. Test 6 is a classification test. Test 7 
is similar in nature to the substitution test. 

It is apparent that, within the general field of clerical and 
office occupations, we have a variety of types of ability re- 
quired. In general the tests for office boys and for general 
clerks approximate the nature of the general intelligence 
tests. Typing and telegraphy appear to require somewhat 
more specialized aptitudes. This general group of tests, 
as contrasted with those of factory operations, or of running 
a street-car, or flying, require ability of a more general 
nature. We shall find the emphasis on general ability still 
greater in the following illustrations. 

1L, J.O’Rourke.. Report of Research Section; Annual Report of the Chief : 


Examiner and Director of Research of the United States Civil Service Coma 
mission. Washington, 1923, 
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13. Tests in the higher ranks of business and in the professions 

A few experiments have been made in the application of 
tests tosalesmen. The earliest was by Scott, made in 1915.1 
Scott used a test which is similar in its general content and 
make-up to our modern group point scale, although it was 
not a regularly standardized test. He reports that, by a 
variety of methods of comparison, the correspondence be- 
tween the test and the ability of the salesmen was shown to 
be a fairly close one. A correlation coefficient of .884 was 
given in the case of one group. In contrast with this very 
high correlation is a negative correlation which is reported 
by Oschrin in the case of fifty retail sales persons between 
scores on a variety of mental tests and rating by their 
superiors. 

Widely contrasted results have also been obtained in the 
application of tests to executives. The first case involves 
minor executives. An experiment is cited by Scott and 
Clothier. They gave a test similar to Army Alpha to a 
group of minor executives, and report that the correlation 
between the firm rank and score on the test is .825. In 
contrast to this high correlation Bingham reports no corre- 
lation between a test similar to Army Alpha and success 
rating of seventy-three men.? These success ratings were 
based upon experience records turned in by the men and 
examined by five independent judges. Bingham explains 
that the entire group of business executives had a test 
rating which is superior to that of army officers. The whole 
group is therefore highly selected with reference to intelli- 
gence. Within this group, however, success seems to depend 
not upon the relatively small differences in intelligence, but 
upon non-intellectual traits. 

LW. D. Scott. “Scientific Selection of Salesmen’’; in Advertising and 
Selling, vol. 5, numbers 5, 6, and 7. 1915. 


2W.V. Bingham. “Intelligence Test Scores and Business Success”’; in 
Psychology Bulletin 21, p. 103, 1924, 
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A few attempts have been made to measure the relation- 
ship between academic ability, which may be taken as a 
rough measure of intelligence, and success in vocations, 
particularly in the professions. Nicholson’s experiment has 
already been referred to. He counted the proportion of men 
of various scholarship ranks in Wesleyan University whose 
names appear in Who’s Who. He found that 50 per cent of 
the high honor graduates appear in this volume. One third 
of the men who were elected to Phi Beta Kappa, which is a 
larger group, have their names in Who’s Who, whereas only 
one tenth of the rest of the graduates received this honor. 
Moody, in an unpublished study, reports the correlation be- 
tween scholarship and marks in normal school and the suc- 
cess of normal school graduates.!. The correlation was 
found between academic grades and marks in the courses 
in theory and practice and success as measured by salary. 
These correlations are moderate, and the most common is in 
the neighborhood of .30. D. E. Rice found similar correla- 
tion between the standing of engineering students and the 
engineering school and their salaries.2. The correlations 
varied from .16 to .46. They are therefore about as close as 
those found by Moody. ‘The groups which are here studied, 
like the group studied by Bingham, are of course highly 
selected from the point of view of ability, since college and 
professional students represent only a small portion of the 
population and they uniformly make high scores on intelli- 
gence tests. The evidence before us seems to indicate that, 
within this highly selected group, the correlation between 
intelligence or academic ability and professional success, as 
measured by means of the rather crude devices thus far 
applied, is positive, though very moderate in amount. 

1F. E. Moody. Correlation of Professional Training with Teaching 


Success of Normal School Graduates. Master’s Thesis, University of Chicago 
Library, 1916. 


* Cited by Hollingworth. Vocational Psychology, p. 195. 
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There is one test which has been devised for the measure- 
ment of aptitude in a profession which is radically different 
from the general test just referred to. This is the group of 
music tests devised by Seashore.! Seashore analyzes musical 
capacity into some thirty elementary capacities. He had 
devised a test for each one of these elementary capacities. 
After the application of these tests, he draws a profile curve 
which represents the rank of the individual in each of the 
tests separately. On the basis of this he makes his diagnosis. 
Seashore’s tests have been widely used, and have been found 
to be practically valuable. Ina trial at the Eastman School 
of Music, Stanton found a fair correlation between the test 
scores and the teachers’ ratings of students.” 


14. Summary and conclusion 


Mental tests have been experimented with rather widely 
in the attempt to find a method of more effective selection of 
employees and more intelligent guidance of individuals in 
their education and in their choice of a vocation. In 
general, the use of tests in this field must be regarded as in 
its infancy. In the case of a few rather highly specialized 
vocations specific tests have been found which serve as a 
fairly reliable means of selection. In the case of at least 
one of the professional occupations, music, a series of fairly 
comprehensive tests have been worked out which may be 
used in guidance. If we survey the entire field we find that 
both specialized and general tests are in use in selection, 
whereas in guidance general tests are chiefly used. 

We must distinguish in vocational guidance between the 
use of a test to determine whether or not one’s capacity 
exceeds the critical point in the capacity required in the 


1C, E. Seashore. Psychology of Musical Talent. Boston, 1919. 
2 Hazel M. Stanton. Psychological Tests of Musical Talent. Eastman 
School of Music, University of Rochester, 1925. : 
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vocation, and the prediction of one’s success on the basis of 
the ability which he possesses. It is possible that we can clas- 
sify the vocations, in general, according to the degree of gen- 
eral intellectual capacity which they demand, and as a con- 
sequence that we can place an individual very roughly in a 
group of vocations for which his general capacity fits him. 
We cannot, in most cases, at the present time, determine 
whether or not one possesses the specialized capacities which 
may be demanded, nor can we determine whether or not one 
possesses the non-intellectual traits, such as energy, person- 
ality, character and so on, which are requisites for success. 
Intelligence tests may serve to indicate roughly whether a 
person is more likely to succeed in an occupation of one 
level than in one of another level. They do not indicate 
which of the occupations at a given level one is best fitted 
for, nor do they indicate at all definitely whether one is 
likely to succeed in any occupation. Finally, while mental 
tests do not solve the problem of guidance, they simplify 
by reducing the number of unknown factors with which 
we have to deal and thereby render it easier of solution than 
it would be without them. 
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CHAPTER XVI 
RELATION OF INTELLIGENCE TO DELINQUENCY 


1. The problem 


Is a person who commits crime always, or commonly, de- 
ficient in intelligence? Is a mentally defective person 
always a potential criminal? If the relationship is not as 
close as this, is there a relationship of a marked character 
between mental deficiency and crime, or some form of mis- 
conduct? ‘This is the problem of the chapter. 

We may introduce the evidence on the question and the 
discussion on the evidence by quotations which indicate 
typical points of view. Goddard, who has made prolonged 
and direct study both of feeble-mindedness and of criminal- 
ity, wrote the following in 1914:! 


Every feeble-minded person is a potential criminal. This is 
necessarily true, since the feeble-minded necessarily lacks one or 
the other factors essential to a moral life — an understanding of 
right and wrong, and the power of control. If he does not know 
right and wrong, does not really appreciate this question, then of 
course he is as likely to do the wrong thing as the right. Even if 
he is of sufficient intelligence and has the necessary training so that 
he does know, since he lacks the power of control he is unable to 
resist his natural impulses. 


A second quotation represents the opinion that there is a 
relationship between feeble-mindedness and crime, but that 
it is much less close than is suggested by Goddard. The 

1H. H. Goddard. Feeble-Mindedness, p. 514. 1914. This quotation 
should not be understood to represent Dr. Goddard’s present opinion. A 
personal letter indicates that he has modified his view in the light of recent 


evidence. The quotation is included because it expresses a widely prevalent 
view. : 
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following quotation is from a study by Bronner.! Group §, 
to which Miss Bronner refers, is a group of servant girls who 
were tested in the same fashion as Group D, which is the 
delinquent group. 


But the results obtained by Group S show that this lack of 
capacity of Group D in and of itself does not explain the fact of 
delinquency. For Group S was no more gifted, yet contains only 
members who are not and have not been delinquent, as far as 
known. 

Since Groups D and S when compared prove to be quite on a par 
as far as general intelligence is concerned, we must conclude that 
the explanation of delinquent tendencies shown by members of 
Group D is something other than the intellectual status alone. 
This does not mean, of course, that mentality may not be one 
factor, but at least there must be other factors as well which cause 
these individuals to engage in careers which lead them into conflict 


with the law, while others of like mentality experience no such 
difficulty. 


Summing up the results of the application of the Army 
Alpha Scale to several thousand convicts, Murchison makes 
the following comment on the relationship of intelligence 
to crime: ? 


It would seem that statutory crime and crimes of physical injury 
are causally related very slightly to intelligence, in so far as intel- 
ligence can be measured by mental tests, but more than one half of 
the individuals who commit crimes of fraud are superior individ- 
uals, according to the same standard. Crimes of social dereliction 
are committed by a large percentage of unusually superior individ- 
uals, and also by a large percentage of unusually inferior individuals. 
Temperament must play a much larger réle than intelligence in the 
commission of statutory crimes, crimes of physical injury, and 
crimes of social dereliction. Of course, it is quite possible that 
temperament, meaning by temperament the emotional complex, 
plays the chief réle in the commission of all crimes. 


1 Augusta F. Bronner. A Comparative Study of Intelligence of Delinquent 
Girls, p. 86. ‘Teachers College, Columbia University, 1914. 

2 Carl Murchison. ‘‘American White Criminal Intelligence”; in Journal 
of Criminal Law and Criminology, vol. 15, pp. 239-316. 1924. 
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The table which forms the basis of these conclusions by 
Murchison is given below, page 437. 

We have here, then, three widely divergent views. The 
first is that mental deficiency is the chief cause of crime, or at 
least very highly correlated with criminality; the second is 
that mental deficiency is only one of the factors of crime, 
and the third is that mental capacity is apparently related 
differently to different types of crime, and may possibly not 
be an important factor in crime at all. All of these opinions, 
though so widely divergent, are based upon the interpreta- 
tion of the application of mental tests to delinquents. What 
facts have been discovered about the application of mental 
tests, and how does it come that their interpretation could 
be so divergent? 


2. Results of statistical studies 


There are two types of estimates which we may attempt 
to make and upon which we may base our judgment con- 
cerning the relationship of delinquency to mental deficiency. 
In the first place, we may attempt to determine the propor- 
tion of mentally defective individuals among delinquents in 
comparison to the proportion of mentally defective individu- 
als in the population at large. The question to be answered 
by this type of investigation is: Are there more feeble-minded 
individuals among criminals, or those who commit mis- 
demeanor, than there are among people in general? This 
is the type of comparison which has been most frequently 
made. 

A second comparison, however, might also be made, and 
for some purposes may be the more significant. We may 
ask: What is the proportion of delinquents among mentally 
defective groups in comparison to the proportion of de- 
linquents among people in general? How does the probabil- 
ity that a mentally defective individual will commit a crime 
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compare with the probability that a normal individual 
intellectually will commit a crime? We shall find the 
answers to these questions, simple as they seem, not at all 
easy. We shall find, furthermore, that the diversity in 
the results of the investigations of these questions is largely 
responsible for the diversity in the interpretation of the facts 
showing the relationship between intelligence and cond:ct. 

In attempting to estimate the proportion of criminals 
among the feeble-minded, or the proportion of feeble-minded 
among criminals, the greatest importance, of course, should 
be attached to the results of studies in which mental tests 
have been applied. Because of its extensive scope, and the 
careful statistical treatment which is applied to it, however, 
reference may be made at the outset to the study by Charles 
Goring, in England.! Goring relied entirely upon the judg- 
ment of mental capacity by observers, and considered the 
judgment of weak-mindedness to be entirely reliable. In 
those cases in which he had estimates of intelligence made, 
he used five categories, and had’the individual placed in one 
of these five, namely, intelligent, fairly intelligent, unin- 
telligent, weak-minded, and imbecile. Out of a group of 
496 convicts, 97 were judged weak-minded and imbecile, 
which gives an estimate of 20 per cent (page 171). Goring 
quotes other estimates, however, which place the percentage 
at 10 per cent. He therefore considers that the proportion 
of feeble-minded among convicts ranges from 10 to 20 per 
cent. Against this estimate, Goring sets the estimate of the 
proportion of the feeble-minded in the general population as 
.46 per cent. 

Goring also furnishes estimates on the other question, 
namely, the proportion of feeble-minded who commit crime, 
and the proportion of persons of normal intelligence who 


1Charles Goring. The English Convict. Abridged edition, London, 
1919. 
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commit crime. He estimates that about 7.2 per cent of the 
total male population of England are at some time convicted 
of crime. Miner cites as the percentage of defectives who 
commit crime 63 per cent.!. According to these estimates, 
then, about twenty times as many criminals are feeble- 
minded as non-criminals, and about nine times as many of 
the feeble-minded commit crime as of the population as a 
whole. This would, of course, indicate some sort of relation- 
ship, either direct or indirect, between feeble-mindedness 
and crime. The correlation between the two as calculated 
by the four-fold method is given by Goring as .655. 

A large number of studies have been reported in which 
mental tests were given to various groups of delinquents or 
of criminals. Pintner reports? thirty-one such studies of 
children, and thirteen of adults. He gives the date of the 
study, the number of cases, and the percentage which were 
estimated as being feeble-minded. A survey of such a table 
indicates the extreme diversity in the number who are 
estimated to be feeble-minded by various investigators. In 
the case of children the estimates vary from 7 per cent by 
Ely and Miner to 93 per cent in one case of a study by Hill 
and Goddard. In the studies of adults the estimates vary 
from 17 per cent to 54 per cent. 

In the light of such extreme diversity, it is necessary to 
inquire whether some general principle cannot be discovered 
which may serve as an explanation for the diversity and 
lead us to a correct estimate. One general characteristic of 
the data is that the later reports in general give a smaller 
estimate of feeble-mindedness among delinquents than do 
the earlier ones. Pintner has calculated from his table that 
the median per cent of the first sixteen reports of studies of 
children is 64, while the median of the last sixteen percent- 


1J.B. Miner. Deficiency and Delinquency, p. 215. Baltimore, 1918. 
2 R. Pintner. Intelligence Testing, pp. 266-5. 
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ages is only 26. In the studies of adults, the median of the 
percentages in the six earlier reports is 42, and in the seven 
later ones it is 29. It is evident, then, that there is a 
disposition to estimate the proportions as much less than 
was estimated in the earlier studies. 

A second general fact is that the proportion of feeble- 
mindedness found among women offenders is ordinarily 
much higher than among men. In Crane’s study ! made in 
1915, for example, the proportion of feeble-mindedness found 
among 809 boys was 39, while among 386 girls it was 72. 
Murchison, in a study of men and women prisoners, finds ? 
that the median Alpha score of the men is 62, the same as 
that in the army, whereas that of the women is only 35. 

The diversity in the various reports, and especially the 
trend toward a more conservative estimate in the later 
reports, and the difference in the estimate of men and women, 
seem to indicate that our problem cannot be settled in quite 
the wholesale fashion which has sometimes been attempted. 
Two questions are raised by the situation before us. The 
first is whether there may not be certain technical difficulties 
in the investigation of the problem which have not always 
been satisfactorily met. It may not be such an easy matter 
to determine the proportion of mental defectives among 
criminals as appears on the surface, even with the use of 
mental tests. The second question is whether the bearing of 
intelligence upon crime may not depend in part upon the 
circumstances and upon the nature of the delinquency. 

First, then, are there any technical or methodological 
difficulties in the way of estimating the proportion of men- 
tal defectives among criminals? It is well known that de- 


1H. W. Crane. Report on Feeble-Mindedness, Epilepsy, and Insanity in 
Michigan. Lansing, Michigan, 1915. 

2 Carl Murchison. ‘Criminals and College Students”’; in School and 
Society, vol. 12, pp. 24-30. 1920. 
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linquents — at least those who are apprehended in court, or 
who are confined in reformatories or prisons — are largely 
drawn from certain classes of society. They come from the 
poorer neighborhoods in the towns or cities, and in the main 
from homes of unskilled or semi-skilled workers. Now it has 
been shown repeatedly that the children from these neigh- 
borhoods or from these types of homes make lower scores on 
mental tests than children from the better districts or from 
the homes of business and professional men. It seems 
obvious that an estimate of the proportion of mentally de- 
fective children can only be fairly made when delinquent 
children are compared with norms made upon the groups 
from which they are drawn, and yet this has rarely been 
done. 

A noteworthy exception is found in Bronner’s study.! 
Bronner gave a series of five mental tests to a group of 
delinquent girls, and gave the same tests to three other 
groups of girls, namely, a group of college girls numbering 
36, a group of 34 girls in an evening school in a settlement 
house, and a group of 29 servant girls. The results of the 
comparison of the delinquent group with these other groups 
is given in Part III, and may be summarized in Table 
XXXII. 

Bronner’s table shows that none of the delinquent girls 
equal the college girls in the Easy Opposites or the Hard 
Opposites test, and that not more than 6.7 per cent equaled 
them in any of the tests. When the delinquent girls were 
compared with the evening school students, however, from 
10 to 46 per cent equaled the median of this group, and in 
every test the delinquent girls were equal or superior to the 
group of servant girls. 

It wili be seen that Bronner’s statement concerning the 
situation, quoted at the beginning of this chapter, is well 


1 0p. cit. 
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TaspLe XXXII. Bronner’s CoMPARISON OF PERFORMANCE OF 
A Group or DELINQUENT GIRLS WITH THREE OTHER 
Grovurs or Grrus in Five Mentat TEsts 


(The figures show the percentage of the delinquent group which reaches 
or exceeds the other groups.) 


Tue Comparison Group 
E 
(Evening 


school 
students) 


Easy opposites 


Hard opposites 


Memory for words 


Memory for passages 


Ebbinghaus Completion Test. 


justified by her findings. While the low standing of the 
delinquent girls, in comparison with the population as a 
whole, may be considered of some significance, the fact 
that there are groups who stand equally low and yet are not 
delinquent indicates beyond question that there must be 
other factors which produce delinquency. 

The most widely representative norms upon mental tests 
are of course those which were obtained by the use of the 
army tests. ‘There are now a number of studies of crim- 
inals, both inside and outside the army, made by means of 
the Army Alpha Scale. These studies should throw new 
light upon our problem. 

In the Army Report (page 802) we are given a comparison 
of the distribution of the intelligence rating of two groups of 
prisoners in comparison with the principal sample. This 
comparison is given'in Table XX XIII. 

The prisoners at Leavenworth are evidently slightly 
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superior to the white draft in general. The prisoners in the 
guard house, however, are evidently somewhat inferior to 
the draft as a whole. It will be remembered that the Leaven- 
worth prisoners are those who were committed for the serious 
offenses, whereas the guard house prisoners are those who 
committed more trivial offenses. 


TasLe XXXII. Percentace Comparisons or INTELLIGENCE 
Ratines or Prisoners CONFINED IN CAMP AND 
THOSE CONFINED AT LEAVENWORTH 


moms Jo=[ > [=] © [ee] oo foe 
Dix and McClel- 
lan prisoners... .} 20.6 |25.5 |21.6]18.9] 8.3}3.4]2.1 
Leavenworth 
prisoners 6.0 |18.8 | 20.8 | 23.8 | 16.0 | 8.8] 5.8 
White Draft — 
Principal Sample} 7.1 |17.0 | 23.8 | 25.0 | 15.2 | 8.0/4.1 | 94004 


A recent study reported by Murchison compares the dis- 
tribution in letter grades on the army test of convicts of 
five States with the distribution of the white draft of the 
same States, as given in the Army Report.! The total num< 
ber of convicts examined was 3942. 

An inspection of the table shows that the distribution of 
the letter grades of the convicts and the white draft are 
remarkably similar. In Ohio and Indiana the convicts 
make somewhat lower grades, while in New Jersey they make 
decidedly higher grades. 

A similar study of the convicts in the State of Illinois, 
made by the State Criminologist, Herman M. Adler, shows 
the same close correspondence between the distribution of 
criminals and that of unselected men in the population. 


1 Carl Murchison. ‘‘American White Criminal Intelligence”; in Journal 
of Criminal Law and Criminology, vol. 15, pp. 239-316. 1924. 
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Taste XXXIV. Comparison OF THE STANDING OF CoNVICTS 
or Five States with THE WHITE DRAFT OF 
THE SAME STATES 


New 
JERSEY 


8.9 
18.2 


26.5 


Q4.7 


13.7 | 10.38 


8.5| 2.6 


The facts which have been presented seem to indicate that 
at least a major portion of the large difference in intelligence 
between delinquents and persons in the population at large, 
which have been found in earlier studies, is due to the fact 
that the norms which were used in the comparisons were not 
representative norms, or that the groups with which the 
delinquents were compared were not comparable groups. 

Another fact which has appeared incidentally in our dis- 
cussion, and which must be taken into account in any inter- 
pretation of the relationship between delinquency and in- 
telligence, is the diversity in the intelligence of persons 
committing various types of offenses. Persons committing 
certain types of crime are very likely to be of low-grade 
intelligence. Those committing other types differ very 
little from the population as a whole, and those committing 
still other types are more than likely to be superior in intelli- 
gence. This fact was already noted by Goring, it was 
brought out in an analysis of the scores of the Leavenworth 
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prisoners in the Army Report, and it has been reiterated in 
Murchison’s study. We may cite the latter study, by way 
of illustration. The diversity in the intellectual capacity of 
those committing various offenses is shown in Table 
XXXYV. On this table Murchison bases the statement which 
was given in the quotation at the beginning of the chapter. 


TaBLE XXXY. PercentacE or Inpivipuats Committine VarRi- 
ous Typrs oF CRIME WHOSE SCORE IS BELOW C, ON THE 
ONE HAnp, OR ABOVE C, ON THE OTHER Hanp 


INFERIOR ire oe Govan SUPERIOR 
(Below C) (Above C) 


Fraud 


Force 


Thievery 


Statutory crimes 


Physical injury 


Social dereliction 


Crimes of sex 


Similar evidence that crimes of different sorts are com- 
mitted by persons of different degrees of intelligence is pre- 
sented by Goring. He gives the correlation, by the four- 
fold association method, between crime and the proportion 
of feeble-mindedness. His correlations are as follows: 


Braudulencel: cet. eseteor se .1201 
WHOlence se ite wt ans oomirates aves .8102 
SGKHCRIM Goh, earn Gicien Aaiaee .4630 
mberand burglary. 5... »<s.%« . 5859 
PNESOUPP SS estado sit aaa beteamiiens 76 
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It is apparent that there is considerable similarity between 
the findings of Goring and of Murchison. The chief 
difference is that Murchison found sex crimes to be com- 
mitted by the least intelligent group of criminals, whereas 
Goring found them committed by those of intermediate 
intelligence between the two extremes. There is sufficient 
similarity to indicate that there is a marked difference in the 
amount of mental deficiency among persons committing the 
various types of crime. 

Before proceeding to the interpretation of these facts, one 
other should be mentioned. This is the fact that while 
there may be a difference in the average intelligence of de- 
linquents and non-delinquents, the two groups are not by 
any means distinct. There is a very large overlapping 
between them. This overlapping is brought out well in the 
results of the study by Fernald, Hayes, and Dawley. A 
summary table showing the distribution of mental ages of 
delinquent women and of an army group is shown in Table 
XXXVI. The summary of the study is given in the words 
of the authors: 


It appears: (1) That the average mental capacity of the de- 
linquent women whom we have examined is lower than that of any 
group of non-delinquent adults with regard to whom we have data. 
(2) That, however, the above statement does not imply a selection 
of individuals entirely from the lower end of the scale of intelligence 
for the delinquent group. There is, in fact, an extensive amount of 
overlapping of the delinquents with the non-delinquent group. 
The range of the delinquent group is found to be practically co- 
extensive with the army group, our most representative sampling 
of the general population. Further, the difference between the 
means of the delinquent and the non-delinquent groups, while 
affording adequate indication of a distinction between the two 
groups, is not extreme in amount. In other words, this is definitely 
not a case of “all or none” relationship. 


" M.R. Fernald, M. H.'S. Hayes, and A. Dawley. A Study of Delinquents 


in New York State. New York, 1920. 
2 Fernald et al. Op. cit., p. 433. 
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Taste XXXVI. Menrat Capacity as Mrasurep By STan- 
FORD-BINET. Perr cent DistRIBUTION oF DELINQUENT 
Women AND or Army Group, witH Constants 


(From Fernald, Hayes, and Dowley, p. 418) 


Menrtat AcE Detinquent WoMEN Army Group 
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In interpreting these results it must be remembered that 
delinquent women have been found regularly to have lower 
intelligence than delinquent men. 


3. Interpretation 
Before launching upon the interpretation of these facts, 
let us review briefly the facts themselves. The earlier 
studies, both by means of mental tests and by estimates of 
intelligence, indicated that there is a large percentage of 
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mental defectives among delinquents or criminals, and that 
there is a large percentage of criminals among mental de- 
fectives. The estimate of these percentages varied widely, 
but it was usually large. The later studies have resulted in 
a gradually diminishing estimate of the percentage of de- 
fectives among delinquents, until finally the study of con- 
victs by means of the Army Alpha seems to indicate that 
for the more serious crimes, at least, the distribution of 
mental ability among criminals is, on the whole, not much 
different from that in the general population. The second 
significant fact is that there is a marked difference in the 
intelligence of different classes of criminals, and particularly 
between men and women criminals. Finally even in those 
cases in which a difference is found in the intelligence of 
criminals and the population at large, the overlapping be- 
tween the two is very large and the range of intelligence is 
the same. 

A candid examination of these facts leads to the con- 
clusion that mental deficiency is certainly not the sole, and 
probably not the chief cause of crime. In some types of 
crime, in fact, it can hardly be regarded as even a contribu- 
ting cause. In other types it may be regarded as a contri- 
buting cause of such conduct. This is as far as the merely 
statistical facts can carry us. They do not indicate whether, 
even in the case of those crimes in which there is a correla- 
tion between intelligence and criminology, low intelligence 
is a positive factor, or whether low intelligence merely re- 
moves certain factors which would otherwise operate to 
prevent misdemeanors. 

Since most of the discussions of the subject have gone 
much beyond the merely statistical evidence, we may ven- 
ture a somewhat broader interpretation in the light of recent 
facts. It has been customary to use the idea of mental age 
in interpreting the relation between the mentality and 
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crime. The mentally defective criminal is described as 
similar in mentality to a child, and as therefore lacking in 
responsibility. This may serve as a crude analogy, but it is 
an analogy which can very easily be pressed too far. The 
child does lack responsibility, to be sure, but it is not at all 
clear that this lack of responsibility is solely a matter of 
mental or intellectual limitations. If this were true, the 
mentally gifted eight- or nine-year-old child would be equal 
in social and moral responsibility to the average adolescent. 
This would probably not be asserted by those who use the 
mental-age analogy. The development of responsibility 
in the adolescent age is probably not wholly due to the 
growth in mere intellectual ability which comes at this time. 
In fact, it is conceivable that it does not depend upon in- 
tellectual maturity at all, but upon the development of 
certain social attitudes which come with the ripening of the 
instincts. The child now comes to feel that he is a member 
of a group, having obligations to it which grow out of the 
existence of mutual needs. 

The over-emphasis upon mental maturity as a factor in 
conduct, furthermore, overlooks the obvious fact that chil- 
dren of the same mental maturity, or adults, for that matter, 
who may be similar in intellectual capacity, differ widely in 
their sense of responsibility and in their willingness to con- 
trol their conduct according to accepted standards. Per- 
haps we might go further and say that these individuals 
differ not only in their willingness, but also in their capacity 
to control their conduct. Studies of emotional and of will 
temperament will undoubtedly bring to light facts of 
significance for the control of conduct which observation 
now convinces us exist. The extreme intellectualistic 
explanation of crime is probably derived in part from the 
idea that since crime is stupid it can only be committed by a 
stupid person. This, of course, is a very plausible hypo- 
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thesis and would be very convincing were it not that so 
many persons who would otherwise be judged intelligent 
commit crime, and so many persons who seem ordinarily 
stupid live lives of rectitude. The statement that every 
person of low intelligence is a potential criminal may be 
matched by the broader statement that every person of any 
intelligence level whatever is a potential criminal. A recent 
conviction for murder of two young men of the highest in- 
tellectual attainment must disabuse every candid person of 
the easy intellectualistic explanation of crime. The roots 
of conduct lie much deeper than the explicit recognition of 
profit and loss. They lie partly in the intellectual processes, 
and partly in the realm of feeling. They go back to heredi- 
tary will temperament, and to habits and attitudes formed 
in the earliest years of life.. They depend upon the con- 
ventions which characterize the child’s social environment 
and the interaction between his individuality and these 
conventions. The case studies, published by the Judge 
Baker Foundation of Boston, indicate that no simple and 
easy formula can be successfully applied. The lowest grade 
mental defectives are undoubtedly incapable of responsible 
conduct, but this can hardly be said of those of more mod- 
erate degrees of defect, and for the great bulk of human 
beings, intellectual capacity appears to be only a subor- 
dinate contributing factor in conduct. 
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CHAPTER XVII 
INTERPRETATION OF INTELLIGENCE TESTS 


In the first chapter certain questions were raised concerning 
the meaning and interpretation of mental tests. It was 
pointed out that there are rather wide differences of opinion 
concerning the significance of tests. While these differences 
of opinion are more extreme among persons who are not well 
acquainted with tests, and have not used them, than among 
technical psychologists, nevertheless there is some divergence 
also in the opinion of psychologists themselves. In the 
course of the treatment of the development of mental tests, 
of their technique, and of their application in various fields, 
a good many facts have been presented which bear upon these 
problems and which will enable us to arrive at at least an 
approximate answer to them. At the end of the book, 
therefore, we shall attempt to bring together threads of 
facts and interpretation which have run through the dis- 
cussion of the various topics, and weave them into a pattern 
which shall give a summary of the conclusions that seem 
justified in the present stage of the science. 

The questions concerning the interpretation of mental 
tests which press for solution have to do chiefly with intelli- 
gence tests, or tests of general capacity. It is in reference to 
intelligence tests that the interpretation is the most doubt- 
ful and that contradictory interpretations are the most 
common. Furthermore, the tests of general intelligence and 
their results are of greater practical import than are the 
results of specialized tests. This is because they bear, not 
simply upon problems of behavior in detail, but upon large 
social and political questions. The specialized tests do 
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raise problems, to be sure, but they are largely of a technical 
nature which may be settled by particularized research. 
Our summary and interpretation in this chapter, therefore, 
will have to-do with intelligence tests. 

Any interpretation which might be made of intelligence 
tests and their larger practical bearing, at the present time, 
is necessarily a somewhat tentative one. While a large body 
of evidence has accumulated from the application of tests 
during the last twenty-five years, and particularly during 
the last ten years, much of this research consists simply in 
the accumulation of data which raise problems instead of 
settling them. Much research of a very fundamental nature 
is necessary in order to give scientific proof upon the pro- 
blems which confront us. Since the results of intelligence 
tests are being used as a basis for far reaching conclusions 
and applications to practical life, however, it is necessary 
that we sum up as best we can the evidence as it exists at the 
present time, and that we point out the possible alternatives 
in interpretation and attempt to indicate those interpreta- 
tions which, in the opinion of the psychologists most com- 
petent to judge, are the probable ones. 


1. Two fundamental problems in interpretation 


Of the problems in the interpretation of intelligence tests 
which have attracted the attention of psychologists, two 
stand out as of wide significance. The first has aroused 
animated discussion in lay as well as in professional circles. 
Expressed in simple terms, it is this: Do intelligence tests 
measure native capacity, as they purport to do, or do they 
merely measure education and experience? ‘The second 
question is somewhat more abstruse, but is still of practical 
importance. It is concerned with the nature of intelligence 
—with the constitution of the ability, whether native or 
acquired, which is measured by the intelligence tests. The 
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first question will be treated in this chapter, and the second 
in the next chapter. \ 

It must be frankly recognized, and it is recognized by 
competent psychologists, that our intelligence tests are sub- 
ject to definite limitations. The first limitation is that they 
measure intellectual capacity indirectly rather than directly. 
This must always be the case, since capacity is simply 
potentiality for behavior. A person exhibits capacity only 

_as he acts, and it is only his acts which we can measure. 

The types of limitation which arise from this fact are two. 
In the first place, we do not measure and cannot measure all 
of behavior. We are restricted to the measurement of 
particular samples of behavior. We must assume that these 
samples which we measure have been so selected that they 
constitute fair representatives of the individual’s behavior 
as a whole. 

Within certain limits this assumption is more than an 
assumption, since it is subject to statistical inquiry. We 
may determine statistically how many samples of reaction 
of the kind which are measured in the test are necessary to 
secure in order that the test as a whole may be as reliable as 
it is possible to make it. We may determine, for example, 
whether five of the tests of the kind which are ordinarily 
used in our group point scales are as reliable as are ten, or 
if not, just how many are necessary in order that the maxi- 
mum of reliability may be reached. But the question still 
remains whether our intelligence tests as they are ordinarily 
constituted include all the range of types of behavior which 
are necessary in order that we may have a complete sampling 
of those forms of reaction which constitute intellectual 
activity. 

The second consequence of the fact that intelligence tests 
are indirect measures is that the behavior which they meas- 
ure is conditioned not only by native endowment, but by 
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experience and training. As a consequence, the measures 
which we secure by means of these tests are measures not 
merely of native endowment, but also of the results of train- 
ing, or education. That this is true as a general principle 
nobody has recognized more clearly than have the psycholo- 
gists themselves, and they have repeatedly pointed it out 
in their writings. The attacks of popular writers upon in- 
telligence testing, which are based upon the contentions that, 
tests are in part measures of training, are therefore directed 
against a straw man. But the psychologists, unlike the 
popular writers, are not content with recognizing this general 
principle that behavior is conditioned by both endowment 
and training; they endeavor further to analyze the facts and 
to determine as precisely as possible what bearings the facts 
have upon our interpretation of intelligence tests. 

We must note, in the first place, that the problem is not | 
how far endowment and training affect intelligence test 
scores in general. We are not inquiring what aspects of an 
individual’s behavior at a given moment are determined by 
his original endowment, and what aspects are determined by 
the sum of all the training and experience which he has had 
from birth down to that moment. Neither are we seeking 
to determine what proportion of his conduct or his achieve- 
ment is due to the one or the other factor. Such an inquiry 
is impossible of fulfillment. It would lead us into an in- 
tricacy of analysis that we could not hope to untangle. Our 
problem, on the contrary, is to determine how far differences 
between individuals or growps are caused by original endow- 
ment, or by differences in training. While this is a much 
more circumscribed and a much simpler problem, it is by no 
means easy of solution, and nobody pretends that we have 
yet reached its solution. Every individual’s behavior is a 
product of an unknown original endowment, and of an 
almost equally unknown train of experience and of train- 
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ing. Suppose now, we compare him with another individual 
whose behavior indicates higher intellectual capacity. His 
original endowment and training are also unknown. How 
are we to determine what share the one or the other factor 
has in causing a superiority of one individual over the other? 

We may make an approach to the problem by two 
methods. The first method is a statistical one, and is called 
the method of partial correlation and multiple correlation. 
The partial correlation method may be illustrated in this 
way. Suppose we are finding the correlation between in- 
telligence test scores and school achievement, and our group 
contains children of several different ages. As was pointed 
out in Chapter ITI, the fact that both test scores and school 
achievement are correlated with age will raise the correla- 
tion between these two factors. By using an appropriate 
formula the effect of age can be eliminated, so that we can 
find what the correlation would be if we had a group all of 
the same age. 

Burt ! has attempted to apply this method to the Binet- 
Simon Scale. He wishes, among other things, to determine 
how far the correlation between the intelligence test scores 
and school achievement is due to the fact that both are 
determined by native capacity, and by so doing to deter- 
mine how far this correlation is due to other circumstances, 
such as the inclusion in the test of scholastic material. The 
reader will perceive that this requires, by hypothesis, a true 
measure of native intelligence — one which is independent of 
schooling.” Burt takes as such a measure his reasoning 
test. This is regarded as a measure of native intelligence, 
because it does not correlate with the standing of children 


1C, Burt. Mental and Scholastic Tests. London, 1922. 

*See Karl J. Holzinger and Frank N. I'reeman. “The Interpretation of 
Burt’s Regression Equation”; in Journal of Educational Psychology, vol. 16, 
pp. 577-82. 1925. The intricacies of the question, including the ambiguity 
of the term schooling, cannot be gone into here. 
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in school tests. This conclusion has two serious difficulties. 
The first is that such a criterion does not give us a test which 
is independent of the amount of schooling a child has received, 
which is what we want, ‘but rather of his attainment in 
school, which is a very different matter. The second is that 
a test which has no correlation with school achievement 
cannot be a measure of intelligence as we understand it. 
Intelligence is certainly manifested in part by superior school 
achievement. 

The independent measure which is needed is one which 
does not correlate with the amount of schooling one has had, 
on the one hand, but does give positive evidence of measur- 
ing intelligence — as indicated among other things, by 
scholastic attainment —on the other hand. That, of 
course, is just what we are looking for in our intelligence 
tests. When we get it we can test our present intelligence 
tests by means of it, but the partial correlation method will 
not give it to us, because we need it before we can calculate 
the partial correlation. 

Because of the rather wide notice it has received, and 
because it appears to give in an accurate formula the con- 
stituent factors in the Binet-Simon scale, we may note 
Burt’s next step. By using a number of partial correla- 
tions between the various factors of Binet score, School- 
ing (school achievement — not amount of schooling), In- 
telligence (Burt Reasoning score) and Age, he calculates a 
multiple correlation which purports to show the relative 
share of the factors in the Binet score. The formula is 
as follows (p. 180): 


B= 5648+ .331I7T+.11A 
Where B = Binet-Simon score 

S = School achievement 

I = Burt Reasoning score 

A = Age 
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In interpreting this formula Burt says, “In determining the 
child’s performance in the Binet-Simon scale, intelligence 
can bestow but little more than half the share of school, and 
age but one third the share of intelligence.” 

To further inquire into the validity of this interpretation 
Holzinger has taken Burt’s data and calculated the equa- 
tions with each of the factors on the left side. Its reductio 
ad absurdum appears in the formula for age: 


A= .15B+ 518+ .03I 


According to Burt’s interpretation this means that, in 
determining the child’s age, Binet Score bestows somewhat 
less than a third the share of schooling. Or, to use another 
expression which he employs, a child’s age is a measure not 
only of the Binet score, but largely if not mainly of “the 
mass of scholastic information and skill which in virtue of 
attendance more or less regular, by dint of instruction more 
or less effective, he has progressively accumulated in school” 
(page 182). Correlation, including partial correlation, can- 
not be used directly to indicate causal relationships. 

The second method of attempting to determine how far 
intelligence tests may be due to native endowment is one 
which does not yield an exact mathematical statement, but 
which perhaps is more convincing than the method of partial 
correlation. This method is the very simple one of deter- 
mining how children who make different scores upon intelli- 
gence tests respond to training. When we find, for example, 
that it is practically impossible, by the most strenuous ef- 
forts which we can exert, to so train a child whose I.Q. is 
below .80 that he can do ordinary high school work success- 
fully, we have evidence that is sufficiently convincing to the 
unprejudiced observer that such an individual is deficient in 
innate intelligence. ‘When we find, on the other hand, that 
a child whose I.Q. is 130 to 140 can do ordinary school work 
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with much less explanation than can the average child, that 
he can do the work with greater quickness or facility, and 
that he can successfully undertake projects demanding con- 
siderable originality and has the ability to organize rather 
large masses of data — projects which the average child 
would be incapable of managing — there can be no reason- 
able doubt that such a child is superior by nature. It is un- 
reasonable to presume, as some critics of intelligence tests 
do, that these differences are due entirely or chiefly to 
differences in training during the first four or five years in a 
child’s life. Differences in training during this period are 
undoubtedly very potent — just how potent nobody knows; 
but the extremely stubborn character of the resistance to 
training on the part of some children who have had every 
advantage from birth up, on the one hand, and the remark- 
able response to training on the part of some children who 
have had very poor advantages until comparatively late in 
childhood or youth, on the other hand, are clear evidence that 
the larger differences, when they appear in children of about 
the same experience are fundamentally inherent. 
Comparisons between individuals with a view to ascer- 
taining the share which endowment and training play in 
their development, and in the score they make in tests, 
necessitate long follow-up studies in which careful account 
is taken of all the factors in their education and training. 
In the absence of such studies we may gain some light on our 
problem by studying the contrasts between large groups of 
persons. We turn now to these studies of groups to see what 
evidence they contain on the meaning of intelligence tests. 
Comparisons have been made between groups classified by 
occupation, by locality, by race, or by amount of schooling. 
Between all these groups marked differences have been 
found. But the discovery that these differences exist by no 
means furnishes an explanation of them. Exactly the same 
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differences are interpreted by one school of thought to 
indicate the existence of hereditary factors, and by another 
school to demonstrate the influence of environment. We 
may first review a few typical cases, and then see whether 
we may gain any light on their meaning. 


2. Differences between vocational groups 


We may begin with a comparison that has already been 
referred to in the chapter on vocational tests — the com- 
parison between men in different occupations. 


Taste XXXVI. Toe Mepran Scorzes In Army AupHA oF MEN 
CLASSIFIED AS BELONGING IN CERTAIN OccUPATIONS ! 


No. or Cases Mepian Score 


armen pi eae eee ee 6886 


OccupaTION 


General machinist ............. 


IRailroadiclericeet eee eee 


Bookkeepersinsic.. Gare eee 


‘Accountant... .0 acc Ge ee 


Stenographer or typist ......... 


Mechanical engineer 


[civil CNAINEEL so .2<2 Aaheea ee ee eeeae 


1 Army Report, pp. 824-29. 


It is evident that if we take the test scores at their face value 
there is a vast difference between the average capacity of 
men who are engaged in different occupations. 

It may be that this difference is due partly to the fact that 
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some occupations fit men directly to do well in intelligence 
tests and others do not. If the differences are wholly due to 
a difference in the effect of the occupational activity itself, 
we should not expect to find that the children of men in the 
various occupations should exhibit like differences in in- 
telligence test scores. The studies which have been made, 
indicate, however, that differences of a similar sort exist 
among the children as among the parental groups. For 
example, Pressey and Ralston! tested 548 children and 
calculated the percentage of each of four occupational groups 
which tested above the median. They are as follows: 


OccuPaATION PERCENTAGE 

or FarHers ABOVE MEDIAN 
iProtessiomale, veut dene. os 85 
Ixecutives: Mieco 68 
INTEISA I ee tN Ree 41 
Waborer seater ene 39 


Other studies have yielded similar results. 

The most intensive study of the relation of occupation to 
the intelligence of children appears in Terman’s investiga- 
tion of gifted children. Terman first selected a group of 
children, all of whom had an I.Q. of about 140 or above, 
and then tabulated the occupations of their fathers and 
compared the distribution with the distribution of occupa- 
tions in the population in general. The summary result 
of this comparison is shown ? in Table XX XVIII. 

Terman also rated the occupations of the fathers of the 
gifted children according to the level of intelligence which 
they required, as measured by the Barr scale. The distribu- 


15, L. Pressey and R. Ralston. ‘‘The Relation of the General Intelli- 
gence of School Children to the Occupation of their Fathers”; in Journal of 
Applied Psychology, vol. 3, pp. 366-73. 1919. 

2L. M. Terman. Genetic Studies of Genius, p. 63. Stanford University, 
1925, 


454 MENTAL TESTS 


Taste XXXVIII. Occupation or 560 Fatuers or GIFTED 
CHILDREN CLASSIFIED ACCORDING TO THE Census REPORT 


i Proportion in Per cent of 
Proportion population of quota among 
Bone, fathers Los Angeles fathers of 
No hae and San gifted 
children Francisco children 


(per cent) (per cent) (per cent) 


Professional group 


Public service group 


Commercial group 


Industrial group 


tion of their ratings in comparison with that of the popula- 
tion as a whole is shown in Table XX XIX. 


Taste XXXIX. Disrriputrion or RatineGs FoR FATHERS OF 
526 GirTED CHILDREN AND FOR THE GENERAL ADULT MALE 
PoruLatTion ACCORDING TO THE SCALE FOR RATING THE In- 
TELLIGENCE OF OccUPATIONS 


(Devised by F. E. Barr, p. 72) 


Apvutt MAtEs ny 
GENERAL 


15 or above 


12-15 


9-12 


Taking into consideration these various lines of evidence, 
it is clear that persons in certain occupational groups have 
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higher test ratings than do members of other occupational 
groups, that the ratings of the children of the families in 
occupational groups differ as do their parents, and that the 
occupational scale runs from the professions at one end to 
common labor at the other. Before seeking to explain these 
differences let us examine other group variations. 


3. Differences between geographical groups 

There is ample evidence that wide differences exist in the 
test scores of persons living in different localities. Perhaps 
the most striking of these are the differences between the 
standing in the army tests of recruits from the various 
States. In the Army Report is given the distribution of the 
Alpha scores of 40,530 men (whites) classified by the States 
of their residence. The medians of these distributions 


Tasie XL. Tue MeEpIANsS OF THE STATE DISTRIBUTIONS AS 
CALCULATED FROM TABLE 200 or THE ARMy REPoRT 
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(excluding the States for which there are fewer than 500 
cases) have been calculated by Alexander,* 23 shown in 
Table XL. They have been taken from Table 200 of the 
Army Report. Such startling differences, based upon such - 
careful and extensive measurements, indicate the presence 
of some factor or combination of factors of great magnitude. 

In Table XLI are shown the distributions of the ratings 
of recruits in five Northern and five Southern States. 


Taste XLI. Tur Percentace DistrisuTIon or LETTER 
GRADES OF WHITES IN TEN Camps 


STATE Camp NuMBER 


Grant 


Funston 
Devens 
Custer 
Upton 


Wadsworth 
Gordon 
Travis 
Meade 
Lee 


AVELE BE. c hint nanan mca 


1 Herbert B. Alexander. ‘‘A Comparison of the Ranks of American 
States in Army Alpha and in Social-Economic Status”; in School and 
Society, vol. 16, pp. 388-92. 1922. 
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Similar differences appear when we compare pupils in 
cities with pupils in small towns or in the country. Since 
all the studies with which the writer is familiar show the 
same differences, two illustrations will suffice. Pressey and 
Thomas ! made a comparative study of 2800 city children 
and 538 country children. They express the results in terms 
of the percentage of the country children who excel the 
median of the city children. The amounts are as follows: 


PERCENTAGE OF CouNtTRY CHILDREN MAKING ScorRES ABOVE 
THE MEDIAN OF THE City CHILDREN 


Remcentacs eee 


Book ? made a similar comparison of high school seniors, 
based on an extensive survey. His summary of the com- 
parison between the high school seniors in city and rural 
schools is given in Table XLII. While the differences are 
not so great as are those which have been reported for ele- 
mentary school children, they still persist. The lesser 
difference may indicate that at least part of the superiority 
of city children may be due to superior training, since it 
becomes less as the amount of training increases. In con- 
firmation of this suggestion, Principal R. H. Bracewell of 
Burlington, Iowa, reports, in an unpublished study, that the 
superiority in tests of city pupils on entering high school is 
reduced by the beginning of the sophomore year. 

In harmony with the differences between whole sections 

1S, L. Pressey and J. B. Thomas. ‘A Study of Country Children in (1) 
A Good and (2) A Poor Farming District, by Means of a Group Scale of 
Intelligence”; in Journal of Applied Psychology, vol. 3, pp. 283-86. 1919. 


2W.F. Book. The Intelligence of High School Seniors. New York: The 
Macmillan Co., 1922. 
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Tasie XLII. Per cent or SENIORS FROM CiTy AND RuRAL 
Higu ScHoots Scorine at VARIOUS INTELLIGENCE 
LEvEets — BasEep on AaBout 2400 CasEs 


SraTr MepIANn 
City 60 141 
Northern 
Rural 43 134 
City 58 141 
Central 
46 135 
. 49 136 
Southern 
130 


of the country and between the city and the rural district, 
marked contrasts have been found between the more and 
less favored parts of the same city. Yerkes and Anderson,! 
for example, compared the point scores of young children 
in two city schools of Cambridge, Massachusetts, “which 
differed radically in the social and economic status of their 
pupils.” There were 54 individuals in each group and 
matched pairs were selected who were approximately equal 
inage. The average score of the favored boys was 37.2 and 
of the unfavored boys 29.5. The average score of the 
favored girls was 41.0 and of the unfavored girls 32.6. The 
difference in each case is about 20 per cent. 

A somewhat more extensive comparison of children in a 
favored district and children in a mill district of Columbia, 
South Carolina, was made by Strong.2. The results, which 

1R. M. Yerkes and Helen M. Anderson. ‘‘The Importance of Social 
Status as Indicated by the Results of the Point Scale Method of Meas- 


uring Mental Capacity”; in Journal of Educational Psychology, vol. 6, 
pp. 137-50. 1915. 


? Alice C. Strong. “Three Hundred Fifty White and Colored Children 
Measured by the B. & S. Measuring Scale of Intelligence: A Camp 
Study ”’; in Ped. Sem., vol. 20, pp. 485-515. 1913. 
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are summarized in Table XLIII, also include a comparison 
of the scores of negro children. The Binet-Simon scale 
was used, 


Taste XLII. Distrisution oF THE RatiInG oF THREE 
Groups oF CHILDREN TESTED BY THE BINET SCALE 


Fayorep UNFAVORED GoLonEne 
WHITES Waites 
Rating 
5 dS 


More than 1 year backward..... IL} 18.3 | 21 | 25.6 
Sahishscvoryncee see en cee on 80 | 84.2 | 49 | 81.6 | 61 | 74.4 
More than 1 year advanced..... 0 


Our final comparison between groups which are classified 
according to place of residence is between immigrants from 
the various European countries. This comparison is based 
on the Army tests, and is worked over in terms of a combined 
scale by Brigham.! The average scores of men coming from 
the various countries is as shown in Table XLIV. 


TasBLe XLIV. Scorges or Various IMMIGRANT GROUPS IN THE 
Army SCALE 


England...... 14.87 Belgium..... 12.79 
Scotland...... 14.34 Treland...... 12.32 
Holland...... 14.32 Austria...... 12.27 
Germany..... 13.88 Turkey...... 12.02 
Denmark..... 13.69 Greece...... 11.90 
Canada... .. 13.66 Russia.....- 11.34 
Sweden....... 13.30 Etaly. 5. .cce2 11.01 
Norway...... 12.98 Poland...... 10.74 


1Carl C. Brigham. A Study of American Intelligence, pp. 120, 121. 
Princeton, 1923. 
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In common with the other comparisons between environ- 
mental groups these differences are susceptible to more than 
one possible explanation. The proponents of Nordic race 
superiority hold, first, that these various nationals can be ~ 
classified according to race into three groups, second, that 
the national groups of immigrants are fair representatives of 
their countrymen in general, and third, that the differences 
in the test scores are unqualified measures of native intelli- 
gence. On the first point there is dispute among anthro- 
pologists. On the second point we have little or no know- 
ledge. On the question whether the test scores are affected 
by other factors than native capacity we have two further 
comparisons which at least raise a doubt. 

The first*of these comparisons was made by Brigham. 
He gives the average scores of immigrants who have been in 
the United States for different periods of time. The scores 
by five year periods are as follows:! 


0-5 6-10 11-15 16-20 Over 
years years years years 20 years 
11.41 11.74 12.47 1S.50 13.82 


On the racial difference hypothesis this might be due to 
an increase in proportion of immigrants from the alleged 
lower racial stocks in recent years, but a comparison of the 
proportion coming in during the first and the second decade 
of the nineteenth century indicates that this is not the case.” 
The racial advocate is then forced to suppose that im- 
migrants are being drawn from progressively lower strata of 
the countries from which they come. This supposition 
seems rather strained in comparison with the simple hypo- 
thesis that being under the influence of American schooling 
and environment for periods varying from five to twenty- 
five years enables men to make a higher score than they 
otherwise would make. 


1 Brigham. Op. cit., p. 89. 2 Brigham. Op. cit., p. 163. 
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This hypothesis is somewhat strengthened by the second 
fact, which is stressed by Bagley.1 He found that there is 
a correlation between the ratio of elementary school enrol- 
ment to the populations of foreign countries from which 
immigrants come and the average test scores, the coefficient 
being .91. Before venturing upon further interpretation let 
us return to the inspection of other group differences. 


4. Differences between racial groups 


We turn to a comparison of the test scores of racial groups 
which are fairly clearly marked. The largest scale com- 
parison on record is that between negroes and whites in the 
army. A comparison of the letter distribution of scores in 
Army Alpha between the whites and negroes of five North- 
ern and five Southern camps is given in Table XLV.? 


TABLE XLY. CoMPARISON OF THE SCORES OF WHITES AND 
NeEGROES IN Army ALPHA 


Score in Lerrer Grapes 


C— C and 
C+ 


ComposiTION or Group 


D and D— 
(per cent) 


19.4 


A and B 
(per cent) 


(per cent) 


Whites, five Northern camps...... 67.6 TSE 


Negroes, five Northern camps..... 45 .3 Dileve 3.6 
Whites, five Southern camps...... 34.8 8.6 
Negroes, five Southern camps..... 78.7 Sif 


Go 


This table shows that over twice as many negroes make 
low scores as are made by whites of the same region of the 


1W. C. Bagley. “‘Army Tests and the Pro-Nordic Propaganda”; in 
Educational Review, vol. 67, pp. 179-87. 1924. 

2R. M. Yerkes. Psychological Examining in the United States Army 
pp. 679, 719. Washington, 1921. 
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country, whereas the preponderance of whites over negroes 
making high scores is still greater. The table aiso illustrates 
further the great difference in the scores of Northern and 
Southern men. Since this difference cannot be attributed to 
race there must be some other factor besides race to account 
for the superiority of some groups over others. If this factor 
consists in whole or in part in education or some other en- 
vironmental influence, the difference between negroes and 
whites may be due in part to environment. 

A similar difference in favor of whites is found in a com- 
parison of white and negro children. In Strong’s study, 
summarized in Table XLIII, a group of negro children were 
found to stand below the white children of the mill district, 
and much below the white children of the favored district. 
A marked difference in both a verbal and a non-verbal group 
test was found by Sunny.! She gave the Myers Mental 
Measure, a non-verbal test, to 1053 white and 1113 negro 
children, and the National Intelligence Test to 5834 white 
and 1112 negro children. The percentage of negro children 
who excelled the median of the white children at each age 
was as follows: 


Sia 10} 11 | 12 


Myers Test x 20 | 24 | 22 | 22 nee 12 


National Intelligence Test 10 | 28 | 31 | 17 | 17 21] 20 |15 | 


From this comparison it appears that the language test 
does not place the negro children at a disadvantage. They 
stand low in both the language and the non-language tests. 
In this respect they differ from some of the foreign language 
groups. 


1Dagne Sunny. “Comparison of White and Negro Children in Verbal 
and Non-Verbal Tests”; in School and Society, vol. 19, p. 469. 1924, 
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A second racial group which is very distinct in the United 
States is the Indian. At least three studies of the standing 
of Indians in intelligence tests have been made, and they 
show uniformly that Indians make even lower scores than 
do negroes. Furthermore, in two of the experiments the 
pure blood Indians were found to stand lowest, while those 
of mixed blood stood higher in proportion to the amount of 
white blood in their veins. The following table from 
Hunter’s ! study is representative. The Indians were stu- 


TasBite XLVI. Scores on THE Otts Test or INDIANS OF 
DIFFERENT DEGREES OF MrixtTuRE or BLoop 


URE 
Z ae 


Number of cases... 


dents at Haskell Institute and were therefore probably a 
somewhat select group. They represented 65 tribes and 
14 tribal mixtures. The comparison of their scores with 
that made by whites may be expressed in the statement 
that 85 per cent tested below age. The correlation be- 
tween the score and the amount of white blood, including 
pure whites, was found to be .51, when age and schooling 
were made constant. 

Other racial comparisons have been made by testing the 
children of various immigrant groups. If the comparison is 

1 W.S. Hunter and Eloise Sommermeier. ‘‘The Relation of the Degree of 


Indian Blood to Score on the Otis Intelligence Test”; in Journal of Com- 
parative Psychology, vol. 2, pp. 257-77. 1922. 
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made only between children of foreign born parents and 
American children of American parents the effect of limited 
familiarity with the English language upon the children’s 
achievement in the tests must be considered; but this 
difficulty is not present when we compare children of differ- 
ent national origins with each other. When certain national 
or racial groups fall consistently low and others stand con- 
sistently well up, and when they are similar in respect to 
possible language handicap and unfavorable social environ- 
ment, there appears to be good evidence that an inherent 
racial difference exists. 

A typical study is the one by Murdoch.t She compared 
boys of four groups in the Pressey Group Scale of Intelli- 
gence. The results are shown in Table XLVII. 


TasBLe XLVII. Mepran Scores or CHILDREN OF FouR 
Racrtat or NATIONAL GROUPS 


The Italian group is seen to stand lower than any of the 
other groups. Italian children and Polish children are found 
uniformly to stand low in studies of this sort. In this study 
but 15 per cent of the Italian children equaled the median 
score of the Jewish and American children. 

The conviction that such differences as these are not 


1 Katherine Murdoch. “A Study of Race Differences in New York City”; 
in School and Society, vol. 11, pp. 147-50. 1920. 
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wholly due to language or environment is strengthened by 
the fact that Chinese children in the United States stand 
fairly high in tests. Pyle! gives the following comparison, 
showing the per cent which the Chinese children’s average 
score is of the American children’s average score. 


Boys Giris 
Rote memory 117.0 108.3 
Logical memory 87.3 94.7 
Substitution 88.6 77.9 
Analogies 36.0 26.8 
Spot pattern 90.4 


Pyle writes that the mentality of the Chinese children is 
much nearer the norm for American white city children 
than is that of negro children or of rural whites. He believes 
that if allowance were made for language difference the 
Chinese children would equal that of American whites. 
Other studies give support to this belief. 


5. Differences between groups with various amounts of 
schooling 

We have had illustrations of differences between persons 
working at different occupations, living in different places or 
belonging to different races. Our additional comparison will 
lead us directly to the interpretation of the facts which have 
been reviewed. It has been pointed out repeatedly that there 
is a correlation between the amount of schooling an indi- 
vidual has had and his standing in intelligence tests. The 
army tests give us the most extensive data on this point as 
on many others. In Table XLVIII? are compiled the 
median scores of groups of men who are shown by their 

1W.H. Pyle. ‘A Study of the Mental and Physical Characteristics of 
the Chinese’’; in School and Society, vol. 8, pp. 264-69. 1918. 


? From the Army Report, assembled from tables on pp. 706, 767, 768, 
and 770. 
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records to have completed respectively four school grades, 
eight grades, the high school, and coJlege. This comparison 
is made for five groups of men. 


TasLe XLVIII. Tur AVERAGE Scores IN Army ALPHA MADE 
sy Men with DirreRENT AMOUNTS OF SCHOOLING 


Group : 
High Beyond | Toran 
joe [oe [ai | cote se | Ba 


White officers........ 112.5 |107.0 | 131.1 | 143.2 | 143.5 | 139.2 


White draft native.... 117.8 | 145.9 


White draft foreign. . . 


Colored draft, North. .: 


Colored draft, South. . 


The outstanding fact which is revealed in this table is that, 
with one exception in the case of the officers, the men who 
have had more schooling make the higher scores. This 
appears when we compare the averages in the various 
horizontal rows, looking from left to right. The table also 
gives us a comparison between the different groups of men, 
which can be made by running up and down the vertical 


columns; but our primary concern is with the first com- 
parison. 


6. Interpretation of the various group differences 
In our effort to get an explanation of the various group 
differences which have been shown, and by so doing to gain 
light on the fundamental meaning of intelligence tests, let us 
begin with the last comparison. The two contrasted views 
of the nieaning of the tests are well represented in the ex- 
planations which are offered of this simple fact. The first 


INTERPRETATION OF INTELLIGENCE TESTS 467 


view is the apparently simpler one that persons with greater 
amounts of schooling make higher scores because their 
schooling raises their intelligence, that is, increases the abil- 
ity which is measured by-.the tests. The mere mathemati- 
cal fact of correlation, of course, does not indicate which is 
cause and which is effect, but the suggestion which comes 
most naturally to mind is that schooling is the cause and 
intelligence is the effect. This is the explanation which 
many hold to be the true one. 

Many psychologists, on the other hand, believe that the 
explanation which is apparently the simpler is proven by 
other facts not to be the true one. We know, for example, 
that pupils drop out of school in part because of a limitation 
in their ability, which makes it difficult, if not impossible, 
for them to go on. One psychologist, Pillsbury, regards 
this selective elimination as of such importance that he con- 
siders the most important function of the school to be to 
pick out the intelligent individuals and put them in positions 
of leadership, rather than to teach them.! The view that 
the chief factor in the correlation between intelligence and 
amount of schooling is native capacity is further supported 
by the other evidences that intelligence test scores are de- 
pendent chiefly on native capacity. We may proceed to 
consider some of the other evidences which lie in the data 
before us. . 

Consider first the exception in the above table which 
appears in the case of the white officers. ‘Those who have 
had only four years or less schooling stand higher than those 
who have completed five to eight grades. This fact is 
puzzling on the hypothesis that it is education which is 
solely responsible for high scores. We seem here to have a 
few very intelligent individuals who, in spite of extremely 


1W.B. Pillsbury. “Selection — An Unnoticed Function of Education”’; 
in Scientific Monthly, vol. 12, pp. 62-74. 1921. = 
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limited education, are chosen as officers and make relatively 
high scores on the test. On further examination, however, 
there is evidence in these data that the amount of education 
affects the score. This group of men with very little school- 
ing must have been at least as gifted by nature as the aver- 
age of the officers. Otherwise they would not have been 
made officers in spite of their meager education. Their low 
score in comparison with the high school and college stu- 
dents, then, must be due to their lack of education. 


Alpha Score 200 


a, All enlisted men 

b, Enlisted men, over 8th Grade 
c. Officers, below 8th Grade 

d, All officers 


Fic. 31. Disrrrsution or ALPHA ScorEs OF OFFICERS AND 
Enurstep MEN with MORE THAN ErquTa-GRADE OR LESS 
THAN Eracuru-GRADE SCHOOLING 


(From Army Report, pp. 765,779.) 


‘ 


The same reasoning applies to the “crucial test” of the 
army tests which is described by Brigham.! Brigham cites 
the Army Report (pp. 778, 779), which shows that 660 officers 
who had not gone beyond the eighth grade in school make 
slightly higher scores on the average than 13,943 native born 
recruits (not officers), all of whom had gone beyond the 
eighth grade. The officers, in spite of their lack of educa- 

‘Carl C. Brigham. Op. cit., pp. 68 ff. 
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tion, made slightly higher scores than enlisted men with 
more education. 

This fact does adequately prove that native capacity 
is a large factor in the intelligence test; but it shows with 
equal clearness that education is also a factor. The officers 
in question must have been by nature even more intelligent 
than the average officer. Now officers in general made so 
much higher scores than enlisted men that while 83 per cent 
of the officers received A and B grades, but 18.8 per cent of 
the enlisted men made these grades.!. The median score of 
the officers with less than eighth grade education was 107.3, 
while the median score of the enlisted men with more than 
eighth grade education was 97.4, a difference of only 9.9 
points. When their education is equal, the superiority of the 
officers is very much greater. The group of officers who 
completed high school (see Table XLVIIT) had an average 
score of 131.1 while the corresponding group of enlisted men 
had an average score of 92.1, a difference of 39 points. De- 
ficiency in education reduced the first group from a superior- 
ity of 39 points to a superiority of only 9.9 points. 

The “crucial” comparison of the privates and officers in 
the army is extended in Fig. 31 so as to include four groups; 
all enlisted men of more than eighth-grade education, all 
officers, and officers of less than eighth-grade education. We 
can now make three comparisons. First, we can compare 
enlisted men in general (Curve a) with officers in general 
(Curve d). This comparison yields us little of value, since 
the two groups differ both in schooling and intelligence, 
and it is impossible to tell how much the difference in scores 
is due to the one and how much to the other. In the second 
place we can compare men of the same rank but of different 
amounts of schooling. It is evident that the enlisted men 


1 Yoakum and Yerkes. Army Mental Tests, p. 27. Henry Holt & Co., 
1920. 
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with superior schooling (Curve b) make much higher scores 
than all enlisted men, whose schooling is much less (Curve 
a). Similarly the officers with inferior schooling (Curve c) 
make much lower scores than officers in general, whose 
schooling is much more (Curve d). Obviously schooling 
makes a large difference in the scores. Finally, we can 
compare officers with little schooling (Curve c) with enlisted 
men of much schooling. The higher scores of the officers 
indicate that native capacity makes a large difference in 
the scores. These two latter comparisons indicate unequi- 
vocally that the Army Alpha scores were affected, perhaps 
about equally, by differences in schooling and by differences 
in intelligence. 

There is further evidence that the scores in intelligence 
tests are affected, on the one hand, by education and envi- 
ronment, and that, on the other hand, they do measure, to 
a large degree, differences in native capacity. On the side of 
education consider the following facts: 


7. The influence of education and environment 
The application of our intelligence tests to children of 
foreign extraction commonly reveals a contrast between their 
standing in language tests and in non-language tests. The 
following example will suffice. The great superiority of the 


Tasie XLIX. Comparison or 81 EncuisH SPEAKING AND 
129 ForEIGN CHILDREN IN SEVERAL TESTS 


Muisree Enautsu- RonEret PER CENT 
SPEAKING DiIrrERENCE 


School achievement 


Pintner non-language 


Otis Intelligence Test 


‘From an unpublished master’s thesis, by Clifford R. Maddox. 


- INTERPRETATION OF INTELLIGENCE TESTS 471 


children from English-speaking homes in the Otis Test as 
contrasted with their very slight superiority in the Pintner 
Non-Language Test indicates that home environment is a 
factor in some of our intelligence tests at least. The rela- 
tively good school achievement of the foreign children in- 
dicates that some of the handicap which affects their test 
score may be overcome in their school work. 

The lessening difference between city children and rural 
children as they progress through the school, which has 
already been noted in the studies of Book and of Brace- 
well, indicates that the type of handicap which rural 
children suffer in intelligence tests is in part due to edu- 
cation and training and is not wholly a matter of native 
capacity. 

It is hardly conceivable that the enormous differences in 
intelligence test scores between various sections of the coun- 
try can be due to native capacity alone. The difference 
between the Mississippi average and the Oregon average — 
a difference of 100 per cent — would argue two distinct types 
of human beings. It is equal to the difference found by 
Hunter between pure blood Indians and whites — a differ- 
ence which itself is probably due in part to training. 
Furthermore, the differences due to geographical loca- 
tion, as pointed out by Bagley,! frequently run directly 
counter to the racial hypothesis which is advanced to ex- 
plain their inherent nature. Massachusetts and Con- 
necticut, which stand near the top in the army tests, have 
a very large proportion of foreign stock. The Southern 
States, which stand uniformly low in the army tests, have 
the purest Nordic stock in the United States. The average 
score of all Northern negroes combined is higher than the 
average score of the whites of Mississippi, Kentucky, and 


1W. C. Bagley. ‘‘Army Tests and the Pro-Nordic Propaganda”’; in 
Educational Review, vol. 67, pp. 179-87. 1924. 
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Arkansas. Such facts can very readily be explained by the 
hypothesis that the scores in intelligence tests are affected 
to a considerable degree by education. They can be har- 
monized with the view that the tests measure native capacity 
solely only by the most strained assumption of selection — 
of a migration of the intelligent persons to some parts of the 
country and of the dull persons to other parts. 

If this reasoning is sound, it is at least possible that the 
differences between occupational and racial groups may be 
partly due to education. This conclusion becomes probable 
when we find that the manner of life of certain of the groups 
which stand low involves fewer activities which are similar 
to those which are required in taking intelligence tests than 
does the manner of life of those which stand high. Com- 
pare, for example, the farmer and the railroad clerk. It is 
obvious that the clerk is required by his occupation to do 
things that are more like the things which are done in taking 
an intelligence test than is the farmer. Is it not probable 
that this fact is responsible for part of the difference between 
the clerk’s score of 91.4 and the farmer’s score of 48.3? 
Again compare the bookkeeper, with a score of 100.9, and 
the general machinist, with a score of 62.8. Finally take 
the stenographer and typist with a score of 115.0 in com- 
parison to the mechanical engineer with a score of 109.7 
and the civil engineer with a score of 116.8. 

In the comparisons between races, wherever it is clear 
that there is a marked difference in education, we must 
ascribe part of the difference in scores to this fact. This 
necessity impairs the validity of the distinctions between 
immigrants who have come at different periods or, possibly, 
between immigrants from different countries. We must 
also make some allowance for differences in education in 
comparing negroes and Indians with whites. 
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8. The influence of native capacity 

It is clear that we cannot ignore differences in education 
in comparing the standing of various groups of persons in 
intelligence tests. None of the evidence which has been 
adduced, however, indicates that education is the only, or 
perhaps even the major, factor. There are positive facts on 
the other side which indicate that native capacity is a large 
factor. 

Just as the very large differences between geographical 
groups, because of their very magnitude, seem hardly sus- 
ceptible of explanation entirely on the ground of native 
capacity, so the very large differences between occupational 
and racial groups are difficult to explain wholly on the 
ground of differences in education. This explanation is 
conceivable, but hardly plausible. 

We can go much further than this. When we compare 
racial groups which have had the same educational advan- 
tages and live in a very similar environment, and find a 
persistent difference between them, as in Miss Murdoch’s 
study, we have positive evidence of a native superiority of 
certain races in intelligence. Our measurements are not yet 
sufficiently refined to say just how great this difference is, 
and we should not lose sight of the fact that even in the case 
of the largest differences there is a good deal of overlapping 
between the groups, but that a real and important difference 
exists can hardly be questioned. 

Further supporting evidence is supplied by the compari- 
son of mixed bloods. The case of Indian mixed bloods, 
already cited, is matched by a similar progressive difference 
in the case of mulattoes, as studied by Ferguson.' It is 
conceivable, again, that gradations in social status run 
parallel to degrees of racial mixture and that these account 


1G. O. Ferguson.. “The Mental Status of the American Negro”; in 
Scientific Monthly, vol. 12, pp. 533-43. 1921. 
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entirely for the differences in scores, but the burden of proof 
is on such an assumption. 

Again, the study of twins, while not absolutely conclusive, 
strengthens the case for native distinctions. For example, 
Thorndike ! found that the scores of twins in mental tests 
correlate with each other about twice as closely as do the 
scores of brothers and sisters in general. It seems hardly 
likely that the similarity in the educations of twins is enough 
greater than that of ordinary siblings to account for this 
greater similarity in performance. 

Finally, the observation of individual cases gives convinc- 
ing evidence that differences measured by intelligence tests 
are to a considerable degree native, and are neither caused 
by nor removable by education. A case is described by 
Foster of a boy who had had the most meager education, 
had been kept in seclusion for the greater part of his life, and 
who was very timid from ill use, and yet who tested ap- 
proximately normal. On the other side are numerous cases 
of children who have received every advantage known to 
modern education, and yet who are hopelessly deficient as 
measured by intelligence tests and by the requirements of 
school and life. 


9. Summary 

This detailed examination of the scientific evidence which 
is at hand indicates the correctness of the moderate view 
as contrasted with either extreme. As was pointed out in 
the first chapter, one may regard intelligence tests as an 
. entirely new and perfect instrument for detecting native 
capacity. At the other extreme he may discount them and 
regard them as merely somewhat improved instruments for 
measuring the results of teaching. The consideration of the 


1K. L. Thorndike, Measurements of Twins. Archives of Philosophy, 
Psychology, and Scientific Methods, Columbia University, 1905. 
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historical development of tests, in common with an analysis 
of their results, shows that neither of these views is correct. 
Intelligence tests have made a marked advance toward the 
measurement of native capacity, but their scores are still 
influenced to a considerable degree by the effects of training, 
and in their interpretation this influence must always be 
taken into account. 


CHAPTER XVIII 
THE NATURE OF INTELLIGENCE 


Tue designers of mental tests have frequently said that it is 
not only unnecessary but probably futile to try to discover 
the nature of intelligence. Physicists have been able, it is 
pointed out, to measure electricity withcut knowing its 
nature, and, it is asserted, we can do the same with intelli- 
gence. 

It is unlikely that the argument by analogy throws much 
light on the value of the quest for information regarding the 
nature of intelligence, but even the analogy of the physical 
sciences should suggest that it is productive to analyze the 
facts which appear on the surface and to penetrate, so far as 
we may be able, into the deeper constitution of the material 
with which the science deals. In psychology, then, we may 
expect to derive profit from an examination of the nature of 
intelligence. 

The first question to be raised concerning the nature of 
intelligence is whether intelligence is a single and unitary 
capacity, or whether it is a composite of several or of many 
capacities. We have already seen in Chapter IX that 
Spearman and others regard the facts of intercorrelation | 
between tests as giving evidence of a general factor. We 
may now follow up the theory of this school a little further. 

In his first study of the hierarchical arrangement of co- 
efficients of intercorrelation between tests, Spearman took 
as a measure of the approach to a perfect hierarchy the de- 
gree of correlation between the series of correlation coeffi- 
cients which occupy the several columns of the table.! 


1B. Hart and .C. Spearman. “General Ability, its Existence and 
Nature”; in British Journal of Psychology, vol. 5, pp. 51-84. 1912. 
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Take, for example, columns one and two, Table VI, p. 75, 
which represent the dotting test and the alphabet test. The 
assumption of the hierarchy of intelligences is that if a par- 
ticular test — say the card-sorting test — correlates highly 
with the dotting test in column one, it will also correlate 
highly with the alphabet-finding test in column two. On the 
other hand, if another test, say the memory test, correlates to 
a low degree with the dotting, it will also correlate to a low 
degree with the alphabet-finding. Now, if this rule holds 
throughout, there will be a high correlation between the 
coefficients of column one and column two. Similarly, there 
will be a high correlation between all of the other columns. 
Further, if the tests are arranged in the descending order of 
their average correlation with all the tests, they will also 
be in a descending order with respect to the correlations with 
each individual test, as has already been said. Hart and 
Spearman find the correlation between columns of tables of 
this sort to be in general fairly high. They range from .58 
to .98. This they consider to have a very significant bear- 
ing upon the constitution and relationship of mental ability. 

The matter may be simplified by considering only four 
abilities, for example the first four in Table VI. The in- 
tercorrelations are as follows: 


Ss 4 
Carp- ImputTepD 
Sorting INTELLIGENCE 
1. Dotting 67 .60 
2. Alphabet-finding 74 61 


Test 1 correlates more closely with other tests in general 
than does test 2. It should then have a higher correlation 
with a particular test than should test 2. That is, the corre- 
lation between 1 and 3 should be higher than the correlation 
of 2 with 3. Similarly the correlation of 1 with 4 should be 
higher than the correlation of 2 with 4. Moreover the re- 
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lations of these correlations should be proportional, thus: * 


The reason for this proportionality and the explanation of 
its existence are found in the two-factor theory, which will be 
described in a moment. 

The above equation may be expressed thus: 


is * Tes — T14* T23 = O 


The differences given by this formula are called tetrad 
differences. In the example drawn from Burt the tetrad 


difference would be: 
.67 X .61 — .60 X .74 = —.0353 


Apparently the relations are not proportional, as the 
theory would presuppose, but the use of this single example 
assumes that every coefficient is a true measure of the corre- 
lation in question. In other words, it neglects the error of 
sampling.2- We should not expect the tetrad differences as 
worked out by the above formula to be all zero. We should 
expect their average to be zero, and that their probable error 
should be that which is to be expected from the theoretical 
and the actual probable errors of the tetrad differences. 
Spearman worked out the theoretical and actual probable 
errors for Simpson’s table of intercorrelations and found 
them to be respectively .061 and .062. From another table 
of intercorrelations between measures of physical traits, 
which we should not expect to be arranged in the form of a 
hierarchy, he found the actual and theoretical probable 
errors to be respectively .089 and .011. 

1C, Spearman. ‘Some Issues in the Theory of ‘G’ (Including the Law 
of Diminishing Returns)”’; in Proceedings, British Association, Section J. 
Southampton, 1925. 


*C. Spearman and K. Holzinger. ‘The Sampling Error in the Theory of 
the Two Factors’; in British Journal of Psychology, vol. 15, pp. 17-19. 1924. 
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These calculations seem to give statistical evidence of the 
existence of some sort of general factor in mental tests, which 
is responsible for the fact that the tests are intercorrelated 
with each other in a systematic fashion, some having rela- 
tively high and some relatively low intercorrelation with 
other tests.! This general factor was originally called general 
intelligence. Spearman now prefers to designate it by a 
symbol “G,”’ since he thinks of it as a more abstract element 
of intelligence than is usually meant by the term intelligence 
itself. 

The general factor alone cannot account for intellectual 
achievement and for mental test scores. If it were the sole 
factor, test scores should all be perfectly correlated, except 
for the sampling error, and should, of course, all be equally 
correlated. To account for the fact that some intercorrela- 
tions are low, even approaching zero, Spearman assumes a 
host of particular factors, unrelated to the general factors 
and entirely independent of each other. Keenness of vision, 
for example, would depend chiefly on a particular factor, 
uncorrelated with other particular factors like keenness of 
hearing, and also uncorrelated with “G.” 

This theory is called the “two-factor theory.” The 
theory may be further described thus. One’s response to a 
test situation is determined by two factors, or sets of factors. 
One of these factors is common to all the various responses. 
This is general intelligence or “G.” The other factor is 
specialized, and varies from one test situation to another. 
Again, various test situations demand different degrees of 
general intelligence, or “G.”’ In some responses this general 
factor is the more important; in others the special factor 
is the more important. 

1 For the demonstration that such a relation requires the assumption of 


two factors, see C. Spearman. Proc. Roy. Soc. A, vol. 101, pp. 97-100. 
1922, 
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The above theory has been sharply criticized by Thorn- 
dike. Writing in 1913, he called attention to the general 
fact of positive correlation between desirable traits, but 
rejected the hypothesis of a single common factor as being 
responsible for this correlation. It is an error to suppose 
“that some one function is shared by all intellectual traits, 
and that whatever resemblances or positive correlations the 
traits show are due to the presence in each of them of this 
function as a common factor.” “A table of the known de- 
grees of relationship would abundantly confirm the state- 
ment that the mind must be regarded not as a functional 
unit, nor even as a collection of a few general faculties which 
work irrespective of particular material, but rather as a mul- 
titude of functions each of which involves content as well as 
form, and so is related closely to only a few of its fellows, to 
the others with greater and greater degrees of remoteness.” 

Thorndike elsewhere subscribes to the distinction between 
abstract intelligence, social intelligence, and mechanical in- 
telligence, or the capacity to deal respectively with sym- 
bols, persons, and things. This is a rather widely current 
distinction but it lacks statistical support from the results 
of tests, and it is at least possible that the variations in 
abilities to deal with symbols, persons, and things may be 
explained in another way. 

Social intelligence, for example, may be explained as a 
combination of intelligence and of other traits which are 
part of the instinctive or temperamental constitution of the 
individual. Success in dealing with people and in making a 
favorable impression upon them is due to a marked degree 
to such qualities as amiability, lack of timidity, a liking for 
people, a moderate degree of agressiveness, and even to 
personal appearance. ‘These traits are partly native and 


1K. L. Thorndike. Educational Psychology, vol. 3, pp. 363, 366. Columbia 
University, 1914. 
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partly acquired, and they are all distinct from intelligence. 
True intelligence, and such personality traits as these, com- 
bine to make up “social intelligence’; but this is prob- 
ably a composite of several traits and not another kind of 
intelligence. 

Sunilarly, mechanical intelligence may be explained as a 
composite of true intelligence, of manual skill, possibly of a 
special aptitude for thinking in terms of concrete objects 
rather than of symbols, and training. That some persons 
have special aptitude for apprehending relationships among 
concrete objects and others among symbols is probably true. 
In fact there may be even finer distinctions than these, 
since, for example, some appear to excel in the use of mathe- 
matical symbols, and others in the use of oral language sym- 
bols, or words. But these differences are in the material in 
which thought is represented, not in the nature of the think- 
ing activity itself. The process of apprehending relation- 
ships may be fundamentally the same whether the relation- 
ship is between one kind of object or another. 

In spite of his critical strictures, Thorndike, in common 
with other psychologists, speaks of intelligence as though it 
were something besides a mere average of many particular 
and independent capacities. He draws distinctions, for 
example, between classes of abilities, some of them less, 
and some more intellectual. When he comes to a definition 
he describes intelligence in terms of response or behavior 
rather than in terms of an analysis of the mental processes 
involved. 

The issue between these two schools of thought as to 
whether intelligence is a unitary factor, or has a unitary 
factor underlying it, lies, perhaps, still in the field of debate. 
The writer believes that the statistical evidence, so far as it 
goes, favors the notion that intelligence rests upon some 
more or less unitary factor. In addition to the evidence from 
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intercorrelations that there is a hierarchy among intellectual 
capacities, we have the fact which was cited in the discussion 
of the technique of construction of the army tests. It will 
be remembered that the army psychologists sought to find 
constituent tests which would correlate slightly with each 
other — so as to tap the various sorts of intelligence — and 
at the same time correlate highly with outside criteria, but 
they found that when they got tests which correlated highly 
with criteria they also correlated highly with each other. 
This seems to point to a certain unity in intelligence. 

We may pass from this question regarding the unity or 
multiplicity of intelligence to the question of its further 
description, definition, or analysis. Attempts to define 
intelligence have been made in various connections. The 
general psychologist has attempted by analysis to discover 
the marks of intelligent behavior as distinguished from in- 
stinctive or habitual behavior. The physician, scientist, 
and the legislator who are interested in mentally defective 
individuals have sought to define intelligence in order to be 
able to describe accurately wherein such persons are inferior. 
The psychologist who is engaged in designing intelligence 
tests attempts to define intelligence in order that he may 
have a preliminary criterion to guide him in the selection of 
subject-matter. Finally, the interpreter of the scores of 
intelligence tests wishes to know as precisely as possible in 
what respect individuals differ when they make different 
scores on the test. Contributions to our concept of intelli- 
gence as a characteristic in which individuals differ from each 
other have been made by these various types of workers. 

At the outset of our survey of the various definitions it will 
be well to distinguish between the technical and the non- 
technical use of the term intelligence. When we speak of the 
Intelligence Division of the army we obviously use the word 
in a different sense from that which we employ when we 
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speak of intelligence tests. The Intelligence Division has as 
its duty to gather intelligence, that is, information. An 
intelligent person, in this sense, is a person who is well- 
informed. In accordance with the universal lack of precision 
of popular terms, the term intelligence has also another mean- 
ing. It suggests native wit, brightness, keenness, capacity 
to learn. When the term was adopted as a technical term in 
psychology it was given the second meaning, divorced from 
the first. This technical use of the word is now well estab- 
lished. 

Another distinction will be serviceable at this point. 
Descriptions of intelligence may have two rather distinct 
purposes. When William James, for example, describes 
the manner in which a person sets about to discover the 
cause of a smoking lamp and the remedy for the defect, he 
does so in order to set forth the essential nature of intelli- 
gent behavior, to get at the aspects of intelligence which 
characterize it wherever it manifests itself, whether in high 
or low degree. Similarly, when Spearman describes intelli- 
gence as consisting in relational thinking — the apprehension 
of experience, the eduction of relations and the eduction of 
correlates — he is telling what intelligence consists of, but 
not how different degrees of intellectual capacity are con- 
stituted.!  Thurstone’s? classification of intelligence into 
four types — trial and error, perceptual, ideational, and 
conceptual — is another example of the same sort of de- 
scription. ‘This may be called, to use Spearman’s term, a 
qualitative account of intelligence. Contrasted with it is the 
quantitative account, which is for the purpose of explaining 
differences in degree of intelligence among different indi- 


1C. Spearman. The Nature of “Intelligence” and the Principles of 
Cognition, chaps. 1, rx, and xxt. London: Macmillan & Co. 1923. 

27. L. Thurstone.. ‘The Nature of General Intelligence and Ability”; 
in British Journal of Psychology, vol. 14, pp. 248-47. 1924. 
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viduals. In our descriptive account of intelligence for the 
purposes of mental tests we must not stop with a qualitative 
statement, but must show how differences in degree of in- 
telligence in different individuals may be explained. In a 
discussion of mental tests we are, of course, interested 
primarily in the quantitative analysis of intelligence, and 
descriptions or definitions are pertinent only in so far as they 
may throw light on individual differences. 

We may first notice a group of definitions which are 
couched in terms of behavior. The first definitions had 
reference directly to mental deficiency, but the principle on 
which they are based might be extended to include all the 
range of differences. In the words of the British Royal 
Commission, “A feeble-minded person is one who is capable 
of earning a living under favorable circumstances, but is 
incapable, from mental defect existing from birth, or from 
an early age, (a) of competing on equal terms with his normal 
fellows; or (b) of managing himself and his affairs with 
ordinary prudence.” If we extended this type of definition 
it would mean that intelligence is measured by the degree of 
success with which the individual manages the affairs of his 
life. 

This criterion, while it may be fairly satisfactory as a basis 
for determining what individuals should be given custodial 
care by society, is obviously lacking in scientific precision. 
It may include a variety of quite different traits under the 
same category. »Relatively dull persons may succeed in 
making a living by virtue of unusual caution, persistence, 
and disposition to conform to the rules of living laid down by 
society. On the other hand, truly brilliant persons some- 
times make a miserable failure because of violent passions or 
psychopathic disposition. 

We turn next to a series of definitions, which, while they 
define in terms of behavior, have made a sharper analysis of 
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what constitutes intelligent behavior. Thus Thorndike 
writes, “We may . . . define intellect in general as the power 
of good responses from the point of view of truth or fact.” ! 
A group of definitions make the criterion, as does James in 
his general description of intelligent behavior, the ability 
to adjust oneself successfwiy to a relatively novel situation. 
Thus Burt defines intelligence as “the power of readjustment 
to relatively novel situations by organizing new psycho- 
physical combinations”’;? Stern calls it “a general capacity 
of an individual consciously to adjust his thinking to new 
acquirements — it is general mental adaptability to new 
problems and conditions of life’’; * Binet, as quoted by Ter- 
man, describes it as “ (1) the tendency of thought to take and 
maintain a definite direction, (2) the capacity to make adap- 
tations for the purpose of attaining the desired end, and (3) 
the power of self-criticism.” Colvin considers that intelli- 
gence can be summed up in terms of behavior as capacity to 
learn. He defines it as “a group of innate capacities by 
virtue of which the individual is capable of learning in a 
greater or less degree in terms of the amount of these innate 
capacities with which he is endowed.” ® 

These definitions do undoubtedly select the aspect of 
behavior in which intelligent persons characteristically excel 
the unintelligent. The concept which they contain also is of 
value as a guide in constructing tests of intelligence. Unless 
one is a doctrinaire behaviorist, however, he will find it useful 

1K. L. Thorndike. “Intelligence and its Measurement: A Symposium’; 
in Journal of Educational Psychology, vol. 12, p. 124. 1921. 

2 C. Burt. “Experimental Tests of General Intelligence”; in British Jour- 
nal of Psychology, vol. 3, p. 168. 1909. 

3.W. Stern. Psychological Methods of Testing Intelligence, tr. by G. M. 
Whipple, p. 3. Baltimore, 1914. 

4L.M. Terman. The Measurement of Intelligence, p. 45. Boston, 1916. 

5S. S. Colvin. ‘Principles Underlying the Construction and Use of 
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to go beyond this external view and seek to find in the mental 
processes themselves that which is characteristic of superior- 
ity in intelligence. 

In this realm we may begin by a process of exclusion. 
Intelligence is not physical strength or endurance, nor is it 
power of prolonged mental activity. It is not emotional 
stability or the absence of psychopathic tendencies. It is 
not moral disposition or good character. It is not decisive- 
ness or “strength of will.’ All these traits may be, and 
probably are, correlated with intelligence. Further they 
undoubtedly help to make intelligence effective, but in- 
telligence is distinct from them in its meaning, and may exist 
independently of them in fact. 

In the earlier attempts to define intelligence in psy- 
chological terms positively, it was common to try to identify 
it with some one of the particular mental processes. Binet, 
as we have seen in an earlier chapter, thought for a time that 
attention was the essence of intelligence, and sought for a 
measure of intelligence in tests of attention. Ebbinghaus 
thought of intelligence as the ability to combine or to asso- 
ciate the elements of experience, and to perceive their re- 
lationship to one another. Spearman, on the other hand, 
thought at one time that intelligence consisted in the ability 
to discriminate fine differences. 

The results of mental tests have shown that no particular, 
single mental process can be identified as of the essence of 
intelligence. All of these and more are involved to some 
degree in those activities which are characteristic of intelli- 
gent behavior. We require memory, discrimination, associa- 
tion, judgment, concept formation, and the others, and 
differences in the capacity for carrying on these processes 
have a bearing on the level of one’s intelligence. 

While this is true; and we cannot select any one mental 
process as being identified with intelligence, we do find that 
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the measures of intelligence place a different emphasis on 
certain classes of mental processes than on others. Terman 
expresses this fact, perhaps with some exaggeration, when he 
writes, “An individual is intelligent in proportion as he is 
able to carry on abstract thinking.” ! Another way of put- 
ting the matter is to say that intelligence involves chiefly the 
higher mental processes. It depends very little on sensory 
keenness and motor dexterity, more on discrimination and 
readiness of association, still more on abstraction, general- 
ization, reasoning and the ready grasp of unfamiliar and 
complicated relationships. Tests which seem to require 
such mental processes as those at the end of the preceding 
list have proven to be the best measures of intelligence. 

Can we now penetrate beneath this descriptive level and 
find any principle, physiological or psychological, which will 
be more inclusive than any one of the mental processes into 
which mental life is ordinarily divided, and which will re- 
present to some degree the unity of intelligence which seems 
to be indicated by the statistical studies? Such a principle, 
if it can be found, should meet several requirements; it should 
fit the description of intelligence which has been derived from 
a survey of successful intelligence tests; it should agree with 
what we know about the evolution of intelligence; it should 
harmonize with the facts of correlation; and it should cor- 
respond with what we know of the structure of the nervous 
system. 

Such a principle has been proposed by Spearman. It is 
his quantitative principle. He describes this principle in his 
most recent formulation in the following words: 

““G”’ measures something of the nature of an “energy” derived 


from the whole cortex or wider area of the brain. Correspond- 
ingly, the s’s measure the respective efficiences of the different 


> 


1L. M. Terman. “‘Intelligence and its Measurement: A Symposium”; 
in Journal of Educational Psychology, vol. 12, p. 128. 1921. 
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parts of the brain in which this energy can be concentrated; they 
are, so to speak, its ‘‘engines.”” Whenever the mind turns from 
one operation to another, the energy is switched off from one 
engine to another, much as the power supply of a factory can be 
directed, at one moment to turning a wheel, at the next to heat- 
ing a furnace, and then to blowing a whistle.* 


Spearman’s principle was formulated particularly to ex- 
plain the statistical facts of correlation. The energy and the 
engines constitute the two factors which seem to be de- 
manded by the hierarchy of mental capacities. Let us first 
inquire whether it is a good explanation. It would be ap- 
plied, in detail, in some such fashion as this. A particular 
test, say an opposites test, correlates closely with another 
test, say a completion test. This is due, by the hypothesis, 
to the supposition that each of these two tests depends 
chiefly on the general factor, namely the supply of energy 
in the individual’s brain, and very little on the special factor, 
namely the structure of the brain in any of its parts. Two 
other tests, for example one of pitch discrimination and of 
accuracy of movement, correlate scarcely at all. This is 
because the performance in these tests depends almost 
entirely on the structure of certain areas in the brain and 
scarcely at all upon the supply of energy. That is, the 
efficiency of the individual in some kinds of mental opera- 
tions depends upon energy supply, while the efficiency of 
other operations depends upon structure. 

If this hypothesis is correct it may serve to explain the 
facts, but it seems to be unduly complex and rather difficult 
to conceive in its application. Why should an opposites test 
depend more on nervous energy than a discrimination test, 
or why should the effective functioning of certain of the 
neurones of the brain depend more on the amount of energy 
available than others. This seems to the writer a serious 


1C. Spearman. ‘Some Issues in the Theory of ‘G.’” Op. cit. 
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difficulty in the hypothesis. A single principle of explana- 
tion would be more tenable. 

Again, the description of the general factor as quantity 
of energy does not seem very well to meet the other require- 
ments of a satisfactory hypothesis. There seems no reason 
to think that greater supply of nervous energy would 
enable a person to perform the higher intellectual tasks as 
distinguished from the lower ones, or to make novel adapta- 
tions as compared with accustomed ones. It is true, prob- 
ably, that new adaptations require more energy than do 
old ones, but it does not follow that accustomed adaptations 
are not better made with greater energy than with less. 

Neither does the hypothesis fit any known facts concern- 
ing the evolution of intellect. Higher brains are character- 
ized primarily by greater complexity of structure than are 
lower brains. Similarly, better brains at any particular 
evolutionary level are ordinarily supposed to be better 
because they permit more complex organization of nervous 
impulses, not because they possess more energy. Physio- 
logically, learning is supposed to consist of the possibility of 
organizing groups of neurones and to depend on the capacity 
of the brain for forming such organized groups. 

The preceding criticism has already pointed the way to a 
different hypothesis, and one which, in the view of the writer, 
meets more nearly the requirements. Psychologically, de- 
grees of intelligence seem to depend on the facility with 
which the subject-matter of experience can be organized into 
new patterns. This rearrangement of thought material is 
what characterizes particularly the higher mental processes. 
It is not identified with any one of them, but it underlies 
them all. It fits admirably the behavioristic description 
of intelligence as the capacity for adaptation to novel 
situations. 

Physiologically, this hypothesis means that intelligence 
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depends upon the rearrangement of association patterns 
among the neurones — the formation of new paths of dis- 
charge. It is a familiar notion that superior brains, whether 
in the comparison of evolutionary stages or of individuals, 
are characterized by complexity of potential association- 
forming. This view fits very well the prevailing conception 
of the place of the synapses, or points of connection between 
neurones, in learning. It suits the fact that higher brains 
differ from lower brains chiefly in the size of the association 
areas — the areas which do not mediate particular sensa- 
tions or movements, but make possible an indefinitely com- 
plex series of associations between sensory and motor experi- 
ences. The hypothesis, then, meets well the demands of the 
description of intelligence and of the physiology and evolu- 
tion of the brain. 

The final test is the suitability of the hypothesis to the 
facts of correlation. Here again it seems to fit very well. 
The capacities which have low correlation with one another 
are the sensory and motor capacities. ‘These are the capaci- 
ties which depend to a large degree on the structure of partic- 
ular areas of the brain and their related peripheral organs. 
These various areas and organs might well vary from each 
other in structure quite widely, so that a person who was 
equipped to make very fine discrimination among visual 
sensations might not be especially capable in auditory dis- 
crimination, and so on. 

But when we come to the mechanism by which new pat- 
terns are formed among the data of experience which are 
furnished by the various senses or by the individual’s activi- 
ties, the capacity of the brain may very well be a highly 
general one. The seat of this process, as we have seen, is 
probably the large association areas. We may conceive that 
these areas make possible and easy great complexity of 
organization or pattern forming, either because the neurones 
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themselves are numerous or spatially well arranged, or 
because the structure of their synapses is such as to permit 
easy modification of the resistances to the nervous current 
and thus allow new connections to be made. It seems quite 
possible that brains may differ from each other very greatly 
in this general characteristic of facility in pattern formation, 
due to some such detailed qualities as these, independently 
of their capacity to perform the specialized functions con- 
nected with the particular sensory or motor areas. If this 
is the case we have the condition necessary to produce the 
types of intercorrelations which are found to exist. 

Our analysis points to some such formulation as the fol- 
lowing. Intelligence is represented in behavior by the capa- 
city of the individual to adjust himself to new situations, to 
solve new problems, to learn. On the side of descriptive 
psychology, intelligence is exhibited especially by capacity 
for carrying on the higher mental operations, for abstract 
thought, for dealing with symbols, for generalizing and for 
reasoning. If we analyze the types of operations which 
characterize intelligence we discover an underlying principle 
which fits both the psychological and physiological condi- 
tions. According to this principle degrees of intelligence 
are determined by the general capacity of the psycho- 
physical organism for the formation of new patterns among 
the elements of experience. 
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Hollingworth, H. L., 29, 420, 424. 

Holzinger, K. J., 256, 266, 271, 274, 
448, 478. 

Horizontal classification of pupils 
in school, 23. 

Hunter, W. S., 463. 

Hypothetical growth curves, 
279. 
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278, 


Illinois Examination, the, 172. 
Immigrants, average scores of, 460; 
tests of various groups, 463, 470, 
Impulses, coérdination of, in tests of 

will temperament, 198. 
Imputed intelligence, 
with Binet’s tests, 73. 
Indians, scores made by, 463. 
Individual differences, early studies 
Ole, SPH, 
Individual pupils, administrative 
use of mental tests with, 381. 
Individual scores, tables of, 306, 307. 
Infant scale, the, 135. 
Inheritance of differences, study of, 
34, 
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Inhibition, capacity for, in tests of 
will temperament, 197. 

Intellectual capacity, development 
of specialized tests, 122; coefficient 
of, 134; general, 233; level of, 
relation to age of maturity, 346- 
48; variability in succeeding 
ages, 349-54; nature of, 476-91; 
relation to delinquency, 427-42. 

Intelligence quotient, the, 98, 276- 
85, 289-91, 304-20; table of dis- 
tribution, 309; histogram of, 312; 
percentile curve of, 315; correla- 
tion table, 317; measures of varia- 
tions in, 344-46; and mental ages, 
354-57. 

Intelligence ratings, distribution in 
typical army groups, 161. 

Intelligence test, sample, 3-12; 
successful because of multiplicity 
of individual tests, 82; important 
facts concerning, 181-86; cor- 
relation with college achievement, 
372; interpretation of, 444-74. 

Intelligence and achievement, meas- 
ures of relation between, 285. 

Intercorrelation of mental tests, 
236; of tests given by Burt, 75; 
correlation, relation with, 239. 

Interest, emotional tone and inter- 
est, test of, 206. 

Interests, social, 210. 

International Test, the, 172. 


Interpretation of intelligence tests, 


444-75; two fundamental pro- 
blems, 445; differences between 
vocational groups, 452; differences 
between geographical groups, 455; 
differences between racial groups, 
461; groups with various amounts 
of schooling, 465; various group 
differences, 466; influence of edu- 
cation and environment, 470; in- 
fluence of native capacity, 473. 

Interpretation of statistical studies, 
in relation of intelligence to de- 
linquency, 439. 

Interrelationship of mental traits, 
27. 

Iowa Studies in Child Welfare, 210. 

Items of a test, problems relating to 
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selection and organization, 247-62; 
length of a test, 256; form of or- 
ganization of items of a group 
language test, 257; modes of or- 
ganization of items of non-lan- 
guage test, 261. 


James, William, 235, 483. 
Jastrow, J., 38. 

Johnson, W. H., 361. 
Jones, E. S., 410, 419. 
Judgment test, 205, 213. 


Kelley, T. L., 124. 

Kingsbury, F. A., 399. 
Kingsbury Test, the, 167, 168. 
Knox: Ho Ayal: 

Koerth, W., 250. 

Kohs, 8: C., 125. 

Kornhauser, A. W., 399. 
Kraepelin, E., 53. 

Krueger and Spearman, 53, 62. 
Kuhlmann, F., 89, 354. 


Language test, form of organization 
of items, 257. 

Length of scale, a criterion in the 
choice of a test, 175. 

Length of a test, the, 256. 

Letter ratings in army Alpha, 149. 

Level of intelligence, relation to age 
of maturity, 346; relation to 
achievement, 378-81. 

Liao, S. C., 217. 

Limit of mental growth, age, 357— 
64. 

Link, H. C., 414. 

Lippmann, W., 149. 

Livingstone, W. H., 350. 

Local norms, use of, 302. 


Machines, operation in a complex 
situation, 416. 

Maddox, C. R., 470. 

Manual of Mental and Psychical 
Tests, Whipple, 68. 

Material, completeness and 
venience, a criterion in the 
of a test, 174. 

Materials, test, for use with Stan- 
ford revision, 100. 


con- 
choice 
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Mathematical Aptitude Test, Sten- 
quist, 125. 

Maturity, age of, relation to level 
of intelligence, 346—54. 

Maze test, Porteus, 125, 126. 

McComas, H. G., 418. 

Meaning of mental tests, 13. 

Measurement of Intelligence, The, 
Terman, 94. 

Measurements, significance of, 21. 

Measures derived from the age 
scale, 97; relative standing, 275; 
between intelligence and achieve- 
ment, 285. 

Meier, N. C., 200. 

Memory, growth curves in, 330: 
relationship to estimates of, abil- 
ity, 38; Binet’s four tests of, 50. 

Memory tests, 127; Bolton’s, 38. 

Mental age, concept of, 84; and in- 
telligence quotient, 354-57. 

Mental-age method, of scaling diffi- 
culty of parts of a test, 255. 

Mental alertness tests, 18. 

Mental capacity, tests for analysis 
of, 105-12. 

Mental-Educational Survey Test, 
the, 1738. 

Mental growth, character of, 27, 
365; bearing of mental tests upon, 
327-64; problems concerning men- 
tal growth, 327; form of men- 
tal-growth curve, 329; relation 
between growth curves of indi- 
viduals, 343; relation between 
level of intelligence and age of 
maturity, 346; variability of in- 
telligence in succeeding ages, 
349; evidence from intelligence 
quotients and mental ages, 354; 
age limit of mental growth, 357. 

Mental span, in army scale Alpha, 
138. 

Mental 
260. 

Mental tests, present status of, 1-29; 
recent origin, 1; beginning of de- 
velopment, 2; a simple intelligence 
test, 3-13; reliability and mean- 
ing of mental tests, 13; definition 
and classification of tests, 16; uses 
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of tests, 22; correct answers to 
sample test, 28, 29; uses in the 
army, 159; criteria for choice of, 
173; scores, 263. 

Mentality Tests for Superior Adults, 
Roback, 171. 

Miles, W. R., 124. 

Miller, W. S., 281. 

Miner, J. B., 431. 

Miniature test, the, 412. 

Monroe, W. S., 286. 

Moody, F. E., 424. 

Moore, H. T., 205. 

Moral attitude, tests of, 213. 

Morgan, E., 209. 
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tests of, 123. 

Motor functions, tests dealing with, 
53. 

Motor impulsion, in tests of will 
temperament, 196. 

Motor index, comparison with class 
standing, 42. 

Motor tests, 70. 

Movement, speed of, in tests of will 
temperament, 194. 

Muensterberg, H., 205, 412. 

Mull, H. K., 209. 

Multiple choice test, the, 258, 261. 

Murchison, C., 428, 432, 435, 437. 

Murdoch, K., 464. 

Muscio, B., 399. 

Music Test, Seashore, 118, 228. 

Myers Mental Measure, 167, 462. 
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National intelligence test, the, 3, 
165, 336, 338, 347, 352, 359, 462. 

Nature of intelligence, 476-91. 

Negroes, scores made by, 461. 

New Jersey Composite Test, the, 
173. 

Non-language test, illustration of, 
166; mode of organization of items, 
261. 

Norms, criteria in the choice of a 
test, 177; comparison with scores 
made by children in intelligence 
tests, 379. 

Norms and scores, problems relat- 
ing to, 263. 

Nutt, H. W., 64. 
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Occupations, median scores in, 452; 
intelligence of, 454. 

Oehrn, A., 53. 

Office work, 419-22. 

Officers, army, intelligence of, 162. 

Omnibus Achievement Test, 173. 

Omnibus Mental Test, 173. 

Opposition, resistance to, in tests of 
will temperament, 197. 

Organization and selection of items 
of a test, 247. 

O’Rourke, L. J., 413, 422. 

Otis, A. S., 164, 170, 173, 313. 

Otis Advanced Examination, 
347. 

Otis Group Intelligence Scale, 337, 
339, 340, 359. 

Otis intelligence quotients, 317. 

Otis Self-Administering Test, 
256, 463. 

Otis and Arithmetic scores, correla- 
tion between, 325. 

Otis and Burgess scores, correlation 
table showing relation between, 
323. 

Otis and Gray scores, correlation 
table showing relation between, 
oven 

Otis and Haggerty scores, correla- 
tion table showing relation be- 
tween, 320. 
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the, 


Parker, B., 119. 

Paterson, D. G., 117, 333. 

Pearson, Karl, 54. 

Percentile curve, the, 313. 

Percentile rank, 284. 

Perception tests, early, 53. 

Performance scale examination, 158. 

Perrin, F. A., 128. 

Personal equation, the, 32. 

Personality traits, tests of, 191-225; 
tests of will temperament, 192; 
test of emotional tone, tempera- 
ment and interest, 206; tests of 
moral attitude or judgment, 213; 
tests of esthetic sensibility, 222. 


Picture completion puzzle, age 
progress curve in, 334. 
Pictures, interpretation of, 262. 


Pillsbury, W. B., 467. 
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Pintner, R., 114, 117, 173, 333, 378, 
431. 

Pintner Non-Language Test, 167. 

Poffenberger, A. T., 420. 

Point scales, early development of, 
131-63; the first point scales, 131; 
the Herring revision, 135; the 
U.S. army mental tests, 136; the 
army scale Alpha, 137; the army 
scale Beta, 153; the performance 
scale examination, 158; the use of 
mental tests in the army, 159. 

Point score, the, 265. 

Porteus, S. D., 125, 153. 

Porteus Maze Test, the, 126. 
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Pre-adolescent scale, the, 134. 

Pressey, L. W., 165. 

Pressey, 8) ls, 207, 217, 377, 403, 
457. 

Pressey Cross-Out Test, 336, 351, 
359. 

Pressey Group Point Seale, 337. 

Pressey Group Scale of Intelligence, 
352. 

Pressey Senior Classification Test, 
339. 
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Probable error, the, 33; of coefficient 
of correlation, necessity for ecal- 
culating, 62. 

Proctor, W. M., 371, 390. 

Products-moment method, 61. 

Professional school, selection 
applicants for, 397. 

Professions, tests in, 423-25. 

Profile tests, 117. 

Prophecy law, Spearman’s, 257. 

Psychograph, Rossolimo’s, 119. 

Psychological examinations for col- 
lege freshmen and high school 
seniors, 170; in the United States 
Army, 237. 

Psychological Examining 
United States Army, 362. 

Pursuit meter, the, 124. 

Puzzle box, of the Healy-Fernald 
series, 110. : 

Pyle, W. H., 114, 465. 
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Race, norms for, 297. 

Racial groups, differences between, 
461. 

Ralston, R., 453. 

Rand, G., 282, 290. 

Rank method, the, 61. 

Ratio, accomplishment, 26, 286. 

Raubenheimer, A. 8., 222. 

Raw coefficient, the, 67. 

Raw score, the, 263. 

Reaction, to environment, 112; 
sensory-motor, 123; speed and 
fluidity of, in tests of will tempera- 
ment, 194; decisiveness of, in tests 
of will temperament, 195; per- 
sistence of, in tests of will tempera- 
ment, 197. 

Reaction time, the, 32, 51. 

Rearrangement, in test, 261. 

Reavis, W. C., 208. 

Record Booklet, Terman, 94. 

Relative ability, calculating, 178. 

Relative standing, measures of, 275. 

Reliability of mental tests, 13; 
relation to length of test, 256. 

Reliability coefficient, the, 60, 68. 

Response, simplicity of, a criterion 
in the choice of a test, 176. 

Results of tests, how to tabulate, 
304-26; tabulating the scores, 304; 
the distribution table, 308; the 
percentile curve, 313; correlation, 
316. 

Retardation, 97. 

Right-wrong formula, the, 268. 

Roback, A. A., 171. 

Rogers, H. W., 420. 

Rossolimo, psychograph of, 119. 

Routine operation, in the factory, 
414-16. 

Ruch, G. M., 250. 

Rugg, H. O., 255, 


Saam, T., 382. 

Sample intelligence test, a, 3-12. 

Scale, Alpha, 3, 187, 239, 249, 259, 
262, 274, 339, 371, 395, 405, 408, 
423, 428, 434, 452, 455, 459, 461, 
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465, 466, 468; Binet, 2, 82; per- 
formance, 158; age, 81-104; group 
point, 164-90; point, 131-63. 

Schmitt, C., 112. 

Schooling, different amounts of, 465. 

Score, raw, 263. 

Score, true, formula for finding, 208. 

Scores, range of letter ratings in 
army Alpha, 150; correlation 
tables, 322, 323; individual, 306— 
07; standard deviation in succes- 
sive ages, 353; table of distribution, 
310; tabulating, 304. 

Scores and norms, problems relating 

263; mental test scores, 263; 

the raw score, 263; accuracy of the 
score, 265; sources of error, 265; 
treatment of wrong answers, 267; 
weighting test scores, 272; meas- 
ures of relative standing, 275; 
measures of the relation between 
intelligence and achievement, 285; 
norms, 291; grade norms, 294; 
norms for sex, race and social 
groups, 296; the use of local norms, 
302. 

Scoring, ease and definiteness, a 
criterion in the choice of a test, 177. 

Scott, W. D., 419, 423. 

Scrambled Alpha test, the, 170. 

Seashore, C. E., 41, 411, 425. 

Seashore audiometer, the, 69. 

Seashore Music Tests, 118, 228. 

Selection for special classes, use of 
tests in, 389-91. 

Selection of tests, 
guidance, 409, 410. 

Selection, vocational, 25, 399-426. 

Selection and organization of items 
of a test, 247. 

Sensibility, esthetic, tests of, 222. 

Sensory discrimination tests, 70, 
Bx, TOS, ASE 

Sensory keenness, Seashore’s tests 
of, 41. 

Sensory perception, tests for, 51. 

Sensory tests, recent experimenta- 
tion in, 123. 

Sex, norms for, 296. 

Sentence completion, grade progress 
curves in, 335, 336. 
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Ship test, the, 116. 

Shuttleworth, F. K., 212 

Similar questions, in test, 260. 

Simon, T., 82, 84. 

Simpson, B. R., 77. 

Smedley, F. W., 329. 

Social attitudes, 210. 

Social norms, 300. 

Sommermeier, E., 463. 

Spearman, C., 41, 53, 60, 62, 66, 234, 
238, 256, 265, 476, 478, 479, 483, 
488. 

Spearman and Krueger, 74. 

Special classes, uses of tests in select- 
ing children for, 389. 

Specialization, of general intellectual 
capacity, 244. 

Specialized tests of intellectual ca- 
pacity, development of, 122: 
selection of subject-matter, 229. 

Speed of performance, importance 
of, 253: 

Spencer, P. L., 274. 

Standardization, necessity for, 68. 

Standing, comparison with general 
motor index, 42; correlation with 
a number of mental tests, 47: 
correlation between various college 
subjects, 48; measures of, 275; 
in comparison with teachers’ esti- 
mates of ability, 39. 

Stanford Achievement Test, the, 173. 

Stanford revision of the Binet scale, 
90, 296; derivation of, 94; use of, 
100. 

Stanton, H. M., 425. 

Statistical procedure, 
criticism of, 60. 

Statistical studies, in relation of in- 
telligence to delinquency, 429-39. 

Stenquist, J. L., 125, 159. 

Stern, William, 54, 98, 235, 485. 

Stratton, G. M., 418. 

Strong, A. C., 458. 

Studies of single tests, correlation, 
69, 


Spearman’s 


Subject-matter of tests, selection 
and other problems, 226-46, 
selection, 227; tests of special 


capacity, 228; general intellectual 
capacity, 233. 
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Sunny, D., 462. 

Superfluous part, test designating, 
260, 261. 

Survey of group point scales, 164-90. 

Sylvester, R. H., 124. 

Symonds, T. W., 287. 


Tabulating results, directions as a 
criterion in choice of a test, 179; 
methods of, 304-26. 

Teagarden, F. M., 339, 353. 

Technique of administration of 
tests, Burt’s, 71. 

Technique and theory of mental 
tests, 226-303; subject-matter, 
226-46; selection and organiza- 
tion of a test, 247-62; problems 
relating to scores and norms, 263- 
303. 

Temperament, emotional tone, and 
interest, test of, 206; will, tests of, 
192. 

Terman, L. M., 94, 213, 222, 292, 
357, 371, 394, 453, 485, 487. 

Terman, L. M., and Childs, H. G., 
90. 

Test IV, Psychological Examination, 
L. L. Thurstone, 4. 

Test groups, 106. 

Test material, Terman’s, 100. 

Tests for the analysis of mental 
capacity, 105-30; Burt’s classi- 
fication of, 70; Healy—Fernald, 
109; Columbia University, 44-49; 
uses of, 22; personality traits, 
191-225; profile, 117; correlation 
studies of, 69. 

Tests, mental, reliability and mean- 
ing of, 13-16; definition and classi- 
fication of, 16-22; uses of, 22-29: 
early experimentation with, 32-57; 
application of the correlation 
method, 58-80; age-scales, 81- 
104; analysis of mental capacity, 
105-30; Healy-Fernald test group, 
107-14; other test groups, 114- 
17; profile tests, 117-22; develop- 
ment of specialized tests of in- 
tellectual capacity, 122-30; early 
development Of point scales, 131— 
63; in U.S, army, 136, 159; army 


Alpha scale, 137; army Beta scale, 
153; survey of group point scales, 
164-90; tests of personality traits, 
191~225; technique and theory of, 
226-303; how to tabulate results, 
304-26; bearing upon mental 
growth, 327-64; educational uses, 
365-98; application to vocational 
guidance and selection, 399-426; 
relation of intelligence to delin- 
quency, 427-43; interpretation 
of intelligence tests, 444-75; the 
nature of intelligence, 476-91. 

Theory and technique of mental 
tests, subject-matter, 226-46; 
selection and organization of 
items of a test, 247-62; problems 
relating to scores and norms, 263— 
303. 

Thomas, J. B., 457. 

Thorndike, E. L., 223, 361, 373, 474, 
480, 485. 

Thorndike College Entrance In- 
telligence Examination, 372. 

Thorndike Intelligence Examina- 
tion, the, 170. 

Thorndike Non-Language Test, 167. 

Thurstone, L. L., 4, 127. 169, 170, 
270, 420, 421, 483. 

Toops, H. A., 287. 

Trabue, M. R., 334. 

Training, effect on behavior, 19. 

Traits, interrelationship of, 27. 

Traits, personality, tests of, 191- 
225. 

True correlation, the, 68. 

Two-factor theory, the, 479. 

Types, mental, investigation of, 28. 


Uses of mental tests, 22; basic facts 
underlying the application of 
tests to education, 365; mental 
tests in school, 378; general in- 
telligence level and its relation to 
achievement, 378; administrative 
use of mental tests in dealing with 
individual pupils, 381; mental 
tests as an aid in the determina- 
tion of the right time to enter 
school, 382; classification into 
ability groups, 383; use of tests 
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in selecting children for special 
classes, 389; use of tests in educa- 
tional guidance, 390; use of tests 
in maintaining adjustment of 
pupil to work, 395; selection of 
applicants for college or profes- 
sional school, 397. 


Validation of vocational tests, in 
vocational guidance and selection, 
402-04. 

Validity of intelligence tests, question 
of, 241. 

Value of a test, external criteria of, 
179. 

Variability in intelligence in suc- 
ceeding ages, 349-53. 

Variations, among correlation coef- 
ficients, causes of, 63. 

Vertical classification of pupils in 
school, 23. 

Vocabulary test, 93. 

Vocational groups, differences be- 
tween, 452. 

Vocational guidance, tests for, 24, 
25; application of mental tests to, 
399-426; use of tests from view- 
point of employer, 399; use of 
tests from viewpoint of indi- 
vidual, 400; validation of voca- 
tional tests, 402; types of tests of 
vocational aptitude, 404; general 
intelligence tests, 405; tests se- 
lected by the empirical method, 
409; analysis of ability, 410; 
complex aptitude tests, 411; ac- 
tivity of the job as a test, 413; 
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routine operation, 414; operation 
of machines, 416; clerical and 
office work, 419; the professions, 
423; summary, 425. 

Vocational tests, validation of, 402- 
04; use for various types of jobs, 
414-16. 

Voelker, P. F., 219. 

Volitional perseveration, in tests of 
will temperament, 198. 

Volometer, the, 204. 

Voluntary attention tests, 70. 


Washburn, M. F., 209, 224. 

Washburne, C. W., 387. 

Weber-Fechner law, the, 33. 

Weight discrimination, 
curves in, 331. 

Weighting test scores, 272. 

West, P. V., 269, 274. 

Whipple, G. M., 69, 164. 

Will temperament, tests of, 192, 

Wissler, Clark, 37, 44. 

Wood, B. D., 373, 398. 

Woodrow, H., 280, 297. 

Woody Arithmetic Test, 379. 

Woolley, H. T., 115, 285. 

Woolley-Fischer tests, 410. 

Wrong answers, treatment of, 267. 

Wundt, Wilhelm, 1. 

Wyman, J. B., 213. 
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Yerkes, R. M., 118, 181, 134, 137, 
250, 405, 413, 458, 461, 469. 

Yerkes Point Scale, 134. 

Yoakum, C. S., 137, 405, 469. 
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